NP completeness of deciding whether a set of examples, consisting of strings and states, has a corresponding DFA?

Question

I'm working on a textbook problem, 7.36 in Sipser 3rd edition. It claims that if we are given an integer $N$ and set of pairs $(s_i, q_i)$, where $s_i$ are binary strings and $q_i$ are states (we are unconcerned here with whether the states are accepting), then the problem of "deciding whether there exists some $N$-state DFA for which each $s_i$ causes that DFA to end up in state $q_i$" is NP-complete.

A much more well-known result by Gold (1978) is the same as the above, except with the alphabet being arbitrary instead of binary, and the examples being in the form $(s_i, \mathrm{accept})$ and $(s_j, \mathrm{reject})$ instead of having the state $q_i$. That result is discussed in some other threads, like this, this, and this.

If I naively try to reduce from the Gold problem to the Sipser problem, I would try to replace all "accept" with a single "terminal" accept state, and same for reject, but that fails because if you wanted only a single terminal accept state to handle all accepted strings, you would need to have an NFA with epsilon transitions; if you try to convert that back to DFA, the single terminal states may split up into multiple, which would invalidate your examples.

I suspect Myhill-Nerode may be useful here, since the DFA states correspond to the equivalence classes of strings. I think you can rephrase the Sipser result as deciding whether some given set of strings can be partitioned into some given set of Myhill-Nerode equivalence classes.

I searched through the literature, but was not able to find the Sipser result anywhere. What is another decision problem that we can use as a polynomial-time reduction here? Alternatively, is this result derived or discussed anywhere else?

xdavidliu · Answer 1 · 2021-05-27T21:55:18.393

(Note: the below proof may have some issues raised in the comments. I am in the processing of addressing the issues.)

The following "end-character" method was suggested in the comment above, as well as private communication with some others. Any mistakes are my own.

We construct a polynomial-time reduction from the Gold problem (arbitrary alphabet with "accept" or "reject" in the examples) to the Sipser problem (binary alphabet with states in the examples). Since the Gold problem is NP-complete, it then follows that the Sipser problem is also NP-complete.

The Gold decision problem gives us an integer $n$ and pairs $(s_1, x_1), (s_2, x_2) \ldots (s_k, x_k)$ [where each $s_i$ is a string in some arbitrary alphabet $\Sigma$, and each $x_i$ is either "accept" or "reject"] and asks if there exists an $n$-state DFA $G$ that for each $i$, accepts or rejects $s_i$ based on the value of $x_i$.

We claim that the Gold input has such an $n$-state DFA if and only if the examples $(t_1, y_1) \ldots (t_k, y_k)$ has a corresponding $f(n)$-state DFA $H^\prime$ in the Sipser problem, where:

$t_i$ is some binary encoding of $s_i$ plus a common suffix $Z$ that is unambiguously different from any of the other encoded characters in $\Sigma$.
$y_i = q_{x_i}$; that is, either $q_\mathrm{accept}$ or $q_\mathrm{reject}$
$f(n)$ is a function that be computed and written down in time polynomial in $\log n$.

First, suppose the Gold DFA $G$ exists. Then, there exists a size $O(n \log \lvert \Sigma \rvert)$ DFA $H$ that recognizes the homomorphism ($\Sigma$ encoded into binary) of the language of $G$: simply take every transition in $G$ and split up into transitions corresponding to the binary encoding. Since $G$ may have more than one accept state, so does $H$ (which has the same number of accept states as $G$). Hence, modify $H$ by taking every accept state and inserting transitions corresponding to the "end of string" bitstring $Z$ to a single newly-created $q_\mathrm{accept}$, and inserting the same transitions from non-accept states in $H$ to a single newly-created $q_\mathrm{reject}$. The resulting DFA $H^\prime$ clearly fits the examples $(t_i, y_i)$, and its number of states is bounded above by some easily computed $f(n)$.

Now for the other direction, suppose some $f(n)$-state DFA $H^\prime$ exists which fits the examples $(t_i, y_i)$. Then, since by construction every $t_i$ has the "end of string" bitstring $Z$ as a suffix, we simply take $H^\prime$ and for every state $q$ for which reading $Z$ would take $H^\prime$ to $q_\mathrm{accept}$, we make $q$ an accept state. Then, we remove $q_\mathrm{accept}$ and $q_\mathrm{reject}$, and we have the DFA $H$. We can then construct a DFA $G$ which recognizes the same language with characters decoded from binary to alphabet $\Sigma$. This construction can easily be done by starting with the start state in $G$, trying out every encoded character in $\Sigma$ and writing down the transitions in $G$, and recursively repeating. Since $H$ is finite, this procedure is guaranteed to terminate in polynomial time.

(Note: the above solution simply quotes the Gold result, the proof of which is pretty involved. There's probably a simpler method that directly proves the Sipser result instead of using Gold as an intermediary.)

NP completeness of deciding whether a set of examples, consisting of strings and states, has a corresponding DFA?

1 Answers1