8

Let $N$ be an NFA with $k$ states that recognizes some language $A$.

a. Show that if $A$ is nonempty, $A$ contains some string of length at most $k$.

b. Show, by giving an example, that part (a) is not necessarily true if you replace both $A$’s by $\overline{A}$.

c. Show that if $\overline{A}$ is nonempty, $\overline{A}$ contains some string of length at most $2^k$.

d. Show that the bound given in part (c) is nearly tight; that is, for each $k$, demonstrate an NFA recognizing a language $A_k$ where $\overline{A_k}$ is nonempty and where $\overline{A_k}$’s shortest member strings are of length exponential in $k$. Come as close to the bound in (c) as you can.

This is problem 1.64 from Introduction to the Theory of Computation, 3rd Edition by Michael Sipser.

Parts (a) and (c) are easy. I'm struggling with part (b). There are a couple of things about such an NFA that must be true: it has more than one state and it has at least one cycle. I've been trying to construct an automation for $\Sigma = \{0\}$ but with no success. Every time I create a cycle either $\bar A$ becomes empty or there are "holes" in that cycle (non-accept states) which require another cycle and when I construct that the situation recurses. Extending $\Sigma$ seems only to complicate the problem. I suspect solving part (b) would give a hint for part (d) but I can crack neither.

MangoPizza
  • 1,856
  • 7
  • 27
  • 1
    What is $\bar A$? Is it the complement of $A$? – J.-E. Pin Apr 11 '17 at 10:41
  • @J.-E.Pin Yes, it is. – Artyom Dmitriev Apr 11 '17 at 10:51
  • 1
    Part (c) is extremely easy: it follows from part (a) and $k < 2^k$. It also doesn't seem to provide the correct context for (d). Should both the $A$s be $\bar A$? – Peter Taylor Feb 27 '18 at 12:05
  • 1
    @PeterTaylor both sentences in your comment are incorrect. Part c)'s 2^k follows not from k < 2^K, but rather from the fact that a k-state NFA can be converted to a 2^k state DFA, and for DFA (but not NFA), complement can be obtained by toggling the "accept"-ness of each state, as shown in Sipser problem 1.14. Also, part d is assuming that each Ak is recognized by a k-state NFA, but otherwise nothing is missing or wrong in that text – xdavidliu Jul 11 '20 at 14:30

3 Answers3

2

a. If $A$ is nonempty, there is a path through $N$ from a start state to an accepting state. By removing cycles, we may suppose the path has length at most $k$. The word along the edges of this path is a word of length $\le k$ accepted by $N$.

b. See d. below

c. Use the subset construction to find a DFA $D$ accepting $A$. $D$ has less or equal to $2^k$ vertices by construction. Let $D'$ be the DFA obtained from $D$ by reversing the accepting states (a nonaccepting state is now accepting and vice versa). Then $D'$ accepts $\overline A$. Apply the argument from the first part to find a word of length $\le 2^k$ which $D'$ accepts, or equivalently lies in $\overline A$

d. Let the alphabet consist of one letter '1', and consider words as their corresponding natural numbers in unary. Fix a set of natural numbers $X$. For each $n\in X$, there is an NFA $N_n$ of $n$ states which accepts all numbers which are not divisible by $n$ (its just a cycle with one nonaccepting state same as the start state). Let a new NFA $N$ be the union of the $N_n$ for $n\in X$ with an additional start state (which is accepting), which has a single $\epsilon$ edge to each of the start states of the $N_n$. The natural numbers not accepted by $N$ are precisely the nonzero multiples of $\text{lcm}(X)$, and $N$ has $1 + \sum_{n\in X} n$ states.

We can for instance take $X$ to be the primes less than $p$ for some $p$ to obtain one parameterized sequence of NFA's $N_p$ with the required property. Indeed, arguing very coarsely, $N_p$ has at most $p^2$ states but the first number not accepted is at least $2^{\pi(p)} \ge 2^{p / \log p -1}$ for sufficiently large $p$. Unfortunately this is only super polynomial in the number of states, and not quite exponential. I'm not sure if the idea can be improved to be properly exponential.

vujazzman
  • 2,068
  • 5
  • 12
  • for part a, can we actually make the stronger statement that the word length is at most k - 1, rather than at most k? If there are k states, then removing cycles until the path contains no duplicate states, the number of symbols would be at most k-1 if I'm not mistaken. – xdavidliu Jul 11 '20 at 14:32
  • Sounds fine to me – vujazzman Jul 11 '20 at 23:50
  • For question d, let me remark that there is a similar construction in Ellul et al., "Regular Expressions: New Results and Open Problems" https://cs.uwaterloo.ca/~shallit/Papers/re3.pdf, proof of Theorem 27 – a3nm Jun 06 '25 at 17:13
1

Answering question d, the following describes an NFA with $k$ states for a language $A$ such that the shortest word in the complement language $\bar A$ has length $2^{k-2}$.

Motivational Observations: Let $w$ be the shortest word in $\bar{A}$, and let $S_i$ be the set of states in the NFA for $A$ that can be reached after processing the first $i$ letters of $w$. In particular, $S_n = \emptyset$ if $n$ is the length of $w$.

If $i < j$, then $S_i$ cannot be a subset of $S_j$ (otherwise, you can find a shorter rejected string by omitting the letters in position $i+1,\dots,j$). This motivates the following construction.

Construction: Let $\{1,\dots,k\}$ be the set of states, where $1$ is the initial state, and state $k$ will play a special role (see below). Let $S_0=\{1\}$ and let $S_1,\dots,S_n$ be an enumeration of all subsets of $\{2,\dots,k-1\}$ in non-increasing order of cardinality. So $n=2^{k-2}$. We aim to construct an automaton where these are indeed the sets of states reachable after processing the prefixes of the shortest rejected word w.

To do so, let $a_1,…,a_n$ be distinct symbols in the alphabet. From each state in $S_{i-1}$, we have a transition reading $a_i$ to each state in $S_i$. For each state not in $S_{i-1}$, we have a transition reading $a_i$ to the special state $k$. In particular, $k$ has a transition to itself for all symbols. All states are accepting. Denote this NFA by $N$ and let $\mathcal P(N)$ be its power set DFA. Observe that $S_0=\{1\}$ is the initial state and $S_n=\emptyset$ the only rejecting state of $\mathcal P(N)$.

Proof of Correctness: Let $w$ be a shortest word rejected by $\mathcal P(N)$. Clearly, no run on $w$ in $N$ can ever visit the state $k$, and the intial state $1$ is reachable only before reading the first symbol. So all states of $\mathcal P(N)$ reachable while processing $w$ are of the form $S_i$. Whenever $S_{i-1}$ is reached, the next symbol must be $a_j$ for some $j$ with $S_{j-1}\supseteq S_{i-1}$ (to avoid reaching $k$). Hence $j\le i$, and the next state will be $S_j$. From this, it is clear that the shortest word to reach the only rejecting state $S_n$ of $\mathcal P(N)$ is the word $a_1a_2\dots a_n$.

  • Thank you so much! – MangoPizza May 05 '25 at 21:01
  • Note that in this construction the alphabet size is not fixed and exponential in the NFA size. See the other answer https://math.stackexchange.com/a/3394072 for a superpolynomial lower bound on the alphabet of size 1 – a3nm Jun 06 '25 at 16:41
1

Let $N$ be an NFA with $k$ states recognizing $A$, and $\overline{A}$ nonempty. Then in fact $\overline{A}$ must contain a word of length at most $2^{k-1}$, and a slight modification of Christian's construction above shows that this is tight.

First we show that $\overline{A}$ must contain a word of length at most $2^{k-1}$. Let $N$ have states $\{1, 2, \dots, k\}$, and $1$ be the initial state. Let $w$ be a shortest word in $\overline{A}$, and let $S_i$ be the set of states reachable in $N$ after reading the first $i$ letters of $w$. As argued by Christian, we cannot have $S_i \subseteq S_j$ if $i < j$. Since $S_{0} = \{1\}$, it follows $S_i \subseteq \{2, \dots, k \}$ for $i \ge 1$. Because the $S_i$ must be distinct, we see that $|w| \le 2^{k-1}$.

Now we construct NFA $N$ with $k$ states such that the shortest word rejected by $N$ is of length $2^{k-1}$. We copy the construction above, but instead of a separate 'absorbing state' $k$, we use transitions back to the starting state $1$. Specifically:

Let $a_1, \dots, a_n$ be distinct symbols in the alphabet. From each state in $S_{i-1}$, we have a transition reading $a_i$ to each state in $S_i$. For each state not in $S_{i−1}$, we have a transition reading $a_i$ to state $1$.

Note since we don't use state $k$ for a special purpose, our enumeration of subsets (nonincreasing in size as before) $S_1, \dots, S_n$ has $n = 2^{k-1}$. So the shortest word in $\overline{A}$ has length $2^{k-1}$.