26

Define the language $L$ as $L = \{a, b\}^* - \{ww\mid w \in \{a, b\}^*\}$. In other words, $L$ contains the words that cannot be expressed as some word repeated twice. Is $L$ context-free or not?

I've tried to intersect $L$ with $a^*b^*a^*b^*$, but I still can't prove anything. I also looked at Parikh's theorem, but it doesn't help.

Raphael
  • 73,212
  • 30
  • 182
  • 400
Evgeny Eltishev
  • 771
  • 1
  • 6
  • 8

4 Answers4

41

It's context-free. Here's the grammar:

$S \to A | B|AB|BA$
$A \to a|aAa|aAb|bAb|bAa$
$B \to b|aBa|aBb|bBb|bBa$

$A$ generates words of odd length with $a$ in the center. Same for $B$ and $b$.

I'll present a proof that this grammar is correct. Let $L = \{a,b\}^* \setminus \{ww \mid w \in \{a,b\}^*\}$ (the language in the question).

Theorem. $L = L(S)$. In other words, this grammar generates the language in the question.

Proof. This certainly holds for all odd-length words, since this grammar generates all odd-lengths words, as does $L$. So let's focus on even-length words.

Suppose $x \in L$ has even length. I'll show that $x \in L(G)$. In particular, I claim that $x$ can be written in the form $x=uv$, where both $u$ and $v$ have odd length and have different central letters. Thus $x$ can be derived from either $AB$ or $BA$ (according to whether $u$'s central letter is $a$ or $b$). Justification of claim: Let the $i$th letter of $x$ be denoted $x_i$, so that $x = x_1 x_2 \cdots x_n$. Then since $x$ is not in $\{ww \mid w \in \{a,b\}^{n/2}\}$, there must exist some index $i$ such that $x_i \ne x_{i+n/2}$. Consequently we can take $u = x_1 \cdots x_{2i-1}$ and $v = x_{2i} \cdots x_n$; the central letter of $u$ will be $x_i$, and the central letter of $v$ will be $x_{i+n/2}$, so by construction $u,v$ have different central letters.

Next suppose $x \in L(G)$ has even length. I'll show that we must have $x \in L$. If $x$ has even length, it must be derivable from either $AB$ or $BA$; without loss of generality, suppose it is derivable from $AB$, and $x=uv$ where $u$ is derivable from $A$ and $v$ is derivable from $B$. If $u,v$ have the same lengths, then we must have $u\ne v$ (since they have different central letters), so $x \notin \{ww \mid w \in \{a,b\}^*\}$. So suppose $u,v$ have different lengths, say length $\ell$ and $n-\ell$ respectively. Then their central letters are $u_{(\ell+1)/2}$ and $v_{(n-\ell+1)/2}$. The fact that $u,v$ have different central letters means that $u_{(\ell+1)/2} \ne v_{(n-\ell+1)/2}$. Since $x=uv$, this means that $x_{(\ell+1)/2} \ne x_{(n+\ell+1)/2}$. If we attempt to decompose $x$ as $x=ww'$ where $w,w'$ have the same length, then we'll discover that $w_{(\ell+1)/2} = x_{(\ell+1)/2} \ne x_{(n+\ell+1)/2} = w'_{(\ell+1)/2}$, i.e., $w\ne w'$, so $x \notin \{ww \mid w \in \{a,b\}^*\}$. In particular, it follows that $x \in L$.

D.W.
  • 167,959
  • 22
  • 232
  • 500
Evgeny Eltishev
  • 771
  • 1
  • 6
  • 8
3

This language is context free it was proved in the following paper:

Tomaszewski, Zach. "A Context-Free Grammar for a Repeated String." Journal of Information and Computer Science, 2012 (PDF).

The grammar is as follows: \begin{align*} S&\to E\mid U\mid \epsilon\\ E&\to AB\mid BA\\ A&\to ZAZ\mid a\\ B&\to ZBZ\mid b\\ U&\to ZUZ\mid Z\\ Z&\to a\mid b \end{align*}

Zoomba
  • 131
  • 4
1

Here's a short proof that $L = \overline{\{ww \mid w \in \Sigma^*\}}$ is context-free for any alphabet $\Sigma$.

First, note that the language $L_1 = \{x \mid |x| = 2n, n > 0, x_1 \neq x_{n+1}\} = \{yz \mid |y| = |z| > 0, y_1 \neq z_1\}$ is context-free since a PDA can recognize it by guessing where $z$ begins, checking $y_1 \neq z_1$, and checking $|y| = |z|$ using the stack.

Since CFLs are closed under circular shift, the circular shift of $L_1$, i.e., $CS(L_1) = \{x \mid |x| = 2n, n > 0, x_i \neq x_{n + i}\text{ for some $i$}\}$, is a CFL. This language unioned with the language of odd-length strings (i.e., $CS(L_1) \cup (\Sigma\Sigma)^*\Sigma$) is exactly $L$, so $L$ is a CFL, too.

zinc_11010
  • 312
  • 9
0

Here's a proof that $L$ is context-free by constructing a PDA that recognizes $L$. More generally, for any alphabet, I'll show that $L = \overline{\{ww \mid w \in \Sigma^*\}}$ is context-free.

Idea: The idea for the PDA is to guess (using the PDA's non-determinism) the index $i$ at which the two supposed copies of $w$ differ, and checking that they indeed differ in this location. The difficulty is finding location $i$ in the second half of the string. To do this, the PDA moves to index $i$ in input $x$ while recording $i$ on the stack with that many "$+$" symbols. After reading $\alpha = x_i$, the PDA increments the index by $i$ to $2i$ by popping off the symbols on the stack. It then guesses $j = |w| - i$ and increments the index by $j$ to $2i + j = |w| + i$ while recording $j$ on the stack with that many "$-$" symbols. At this point, it checks that $\beta = x_{|w| + i}$ is different from $\alpha$. If so, then it finally checks that it correctly guessed $j = |w| - i$ by checking that there are $j$ symbols remaining in the input.

Proof: Let PDA $P$ = "On input $x$,

  1. Non-deterministically choose either to check that $x$ has odd length or to go to step 1. If $x$ was found to have odd length, then accept.
  2. Push "$+$" onto the stack for each symbol read from the input. Non-deterministically go to step 2 at any point, as long as it's after pushing at least something to the stack.
  3. Remember $\alpha$, the last symbol it read.
  4. Pop from the stack for each symbol read from the input, until the stack is empty.
  5. Push "$-$" onto the stack for each symbol read from the input. Non-deterministically go to step 5 at any point, including possibly before pushing anything to the stack.
  6. Let $\beta$ be the last symbol it read. If $\alpha = \beta$, reject. If $\alpha \neq \beta$, then go to step 6.
  7. Pop from the stack for each symbol read from the input. Accept iff the stack becomes empty at the end of the input."

Then $P$ accepts $x$ iff $x$ has odd length or there exist $i > 0$ (the number of "$+$" symbols pushed in step 1) and $j \geq 0$ (the number of "$-$" symbols pushed in step 4) such that $\alpha = x_i$ is different from $\beta = x_{2i + j}$ and $2i + 2j = |x|$. Reparameterizing $n = i + j > 0$, we get that $P$ accepts $x$ iff $x$ has odd length or there's $n > 0$ with $|x| = 2n$ and $0 < i \leq n$ such that $x_i \neq x_{n + i}$, i.e., exactly the conditions under which $x \in L$.

zinc_11010
  • 312
  • 9