7

Let $\Sigma$ be an alphabet, and let $x^+,x^-_1,\dots,x^-_n \in \Sigma^*$ be strings over that alphabet. Call a string $s \in \Sigma^*$ good if $s$ is a subsequence of $x^+$ but not a subsequence of any of $x^-_1,\dots,x^-_n$.

Given $x^+,x^-_1,\dots,x^-_n$, I am looking to find the shortest good string $s$. Is there a reasonable algorithm for this? I am interested in a practical algorithm, even if its worst-case running time is not great. In my domain, the strings $x^+,x^-_1,\dots,x^-_n$ might be fairly long but I expect there will exist a good string $s$ that is fairly short, in case that helps.

The case $n=1$ is handled by Shortest sub-sequence of one string, that's not a sub-sequence of another string, but I need to deal with the case $n>1$.

D.W.
  • 167,959
  • 22
  • 232
  • 500

1 Answers1

1

Mistakes

First of all, in the comments I made a few mistakes: Both the original claim I made about interleaving, and the comment "correcting" it (now deleted) were wrong, and separately my claim that trying all possible interleavings must yield an optimal solution was also wrong (I give a simple counterexample below). Finally, my suggestion to set $x^+ = z_j$ and iterate, or use beam search, is actually also not helpful: Whatever answer could be produced by doing this and applying Aryabhata's DP can never be better than using the original $x^+$, since all it does is reduce the size of the solution set from which the DP can pick. Sorry! Hopefully the improved version below contains no further problems...

I also noticed two mistakes in Aryabhata's DP as well. Fortunately they can both be easily repaired (see my comments on that post).

A heuristic solution using random interleavings

If you don't need absolutely the shortest subsequence, you could use the fact that, if a string $s$ is a subsequence of some $x^-_i$, then it is also a subsequence of every possible interleaving of all the strings $x^-_i$. Turning this around, if $s$ is not a subsequence of some particular interleaving of all the strings $x^-_i$, then it is not a subsequence of any individual $x^-_i$.

So you could try many different ways of randomly interleaving the $n$ strings $x^−_i$ into a single string, and for each such interleaving $y_j$, look for the shortest subsequence $z_j$ of $x^+$ that avoids being a subsequence of $y_j$ using Aryabhata's nice DP algorithm for the two-string case, and pick whichever $z_j$ is shortest over all interleavings you tried.

Caveat: No guarantee of optimality even if we try all interleavings

Surprisingly (to me at least), even if you repeat the above procedure for all possible interleavings, you are not guaranteed to find the optimal solution: Consider the instance in which $x^+ = aaa$, $n=2$, and $x^-_1 = x^-_2 = a$. Then $aa$ is an optimal solution with length 2, but the shortest solution found by trying all interleavings of $x^-_1$ and $x^-_2$ is $aaa$, with length 3.

j_random_hacker
  • 5,509
  • 1
  • 17
  • 22