Let $\Sigma = \{a, b\}$. For every language $L \subseteq \Sigma^*$ we denote $\widetilde{L} := \{xy \mid xxy\in L\}$. Prove that if $L$ is regular, then so is $\widetilde{L}$. I tried playing around with the automaton for $L$ and extending it in ways, but I always keep adding unwanted words to the new language and therefore get a language actually bigger than $\widetilde{L}$. I also thought about somehow employing the Myhill-Nerode theorem and the finite equivalence classes of $\approx_L$, but I couldn't rerally figure out an useful way of doing it... Any help is welcome!
1 Answers
A simplified first approach
If $L$ is regular, there is a DFA $D$ which recognizes it.
Here is a nondeterministic algorithm for recognizing $\overline{L}=\{xy : xxy\in L\}$ :
- On input $z$, let us suppose we knew in advance (a) where to divide the input into $x$ and $y$ such that $xxy\in L$, (b) what state $q_x$ the machine $D$ is in after reading the input string $x$. (In reality, we don't know either of these things in advance, but let's suppose.)
- With that extra information, the algorithm for recognizing $z=xy$ is as follows: start a copy of the DFA $D$, but instead of beginning in the start state, begin in state $q_x$ (as if we just finished reading a copy of $x$). Then read the input $z=xy$ as normal. The machine will finish in whatever state it would be in after reading $xxy$. If the machine $D$ accepts, then it means the original machine would accept $xxy\in L$, so we should accept $xy$.
We can non-deterministically guess the information we wish we knew
In reality, we don't know where to divide the input into $x$ and $y$ such that $xxy\in L$, if it's even possible. And we don't know what state $D$ will be in after reading $x$, so we don't know where to finish our simulation. Fortunately, we can use nondeterministic guessing, and there are only finitely many guesses, so we can use a nondeterministic finite automaton to explore all the possibilities, which proves that $\overline{L}$ is regular.
Here is the algorithm:
- On input $z$, assume it can be subdivided into $z=xy$ such that $xxy\in L$, and nondeterministically guess what state $q_x$ $D$ will be in after reading $x$.
- Start two copies of $D$ in parallel. The first one should start in the usual start state. The second should start in state $q_x$.
- Begin reading characters from the input and transitioning both machines as usual. After reading any number of characters, you may nondeterministically guess that you've finished reading the "$x$" portion of the string and are about to begin reading the "$y$" portion of the string. When you decide you've finished reading $x$, you must check whether your original prediction was correct: check whether the first machine is indeed in state $q_x$. If not, reject.
- Stop simulating the first machine, and just continue with the second. Read the rest of the string and accept if the second machine accepts.
A sketch of the NFA
It's a little tedious to translate the plain language explanation into a mathematical defintion, but here's a sketch.
Let $D$ be the DFA for recognizing $L$. Let its states be $Q$ with initial state $q_0$, accepting states $F$, and transition function $\delta$. Now we'll define an NFA for recognizing $\overline L$ in terms of its states $Q^\prime$, accepting states $F^\prime$ and transitions $\delta^\prime$.
The states of our NFA are defined as follows:
- One state for every triple of states from $D$: $Q\times Q\times Q $
- An additional state for every state in $D$: $\{\star\}\times Q$.
- A special new start state $q_{\mathsf{start}}$.
The start state of our machine is $q_{\mathsf{start}}$.
The transitions of the machine are as follows:
- $q_{\mathsf start} \xrightarrow{\epsilon} \langle q_i, q_{0}, q_i\rangle$ for every state $q_i\in Q$. ("Guess which state $q_i$ the DFA would be in after reading $x$"). The first $q_i$ will remember our prediction and will not change. The other two components will allow us to simulate two copies of $D$ in parallel, starting from states $q_0$ and $q_i$ respectively.
- $\langle q_i, q_j, q_k\rangle \xrightarrow{a} \langle q_i, \delta(q_j,a), \delta(q_k,a)\rangle$ for every triple of states $q_i,q_j,q_k \in Q$ and any character $a\in \Sigma$. ("Simulate reading a character on both machines in parallel. Keep remembering the prediction $q_i$")
- $\langle q_i, q_i, q_j\rangle \xrightarrow{\epsilon} \langle \star, q_j\rangle$ ("If your prediction comes true, you may nondeterministically decide you're finished reading $x$.")
- $\langle \star, q_i\rangle \xrightarrow{a} \langle \star, \delta(q_i,a)\rangle$. ("If you're in the state where you're finished reading $x$, just keep reading the rest of the string as normal.")
The accepting states of our new machine are $\{\star\}\times F$, which is to say "when we finish reading $y$, any accepting state of the original machine is an accepting state for us".
- 838
- 5
- 13