10

In class our professor showed us 3 methods for proving non-regularity:

  1. Myhill–Nerode theorem
  2. Pumping Lemma for regular languages
  3. Proof of non-regularity, based on the Kolmogorov complexity

Now the first two, Myhill-Nerode theorem and Pumping lemma, I understood well and I was also able to do the exercises to the first two methods. But I did not understand the third one. The Definition of the third method is as follows:

Let $\ L \subseteq (\Sigma_{bool})^* $ be a regular language. Let $\ L_x=\{ y \in (\Sigma_{bool})^* | xy\in L \} $ for every$\ x \in (\Sigma_{bool})^*$. Then there exists a constant $\ c$, such that for all $\ x,y \in (\Sigma_{bool})^* $

$\ K(y) \leq \lceil log_2(n+1)\rceil+c $

if $\ y $ is the n-th word in the language $\ L_x $.

Now I do not understand how to use this theorem to prove that a language is not regular, I don't really get the concept. We used the kolmogorov complexity before for determining the length of the shortest computer program of an object. How does one prove non-regularity with this theorem? And what is the thought behind it?

Thanks a lot!

gammaALpha
  • 203
  • 1
  • 7

2 Answers2

9

To my knowledge, this is not one of the "classical" approaches used to characterize regular languages.

This approach is discussed in "A New Approach to Formal Language Theory by Kolmogorov Complexity", by Ming Li and Paul M.B. Vitanyi (see section 3.1).

They give several examples where one can use the statement you mentioned instead of using the pumping lemma. For example, proving non-regularity of $L$ where

$L=\left\{1^p|\text{p is prime}\right\}$.

Given some $x\in\Sigma^*$, $L_x=\left\{y| \hspace{1mm} xy=1^p \land \text{p is prime}\right\}$. Let us choose $x=1^{p'}$ where $p'$ is the $k$'th prime. Let $y_1$ be the first word in $L_x$. Obviously, $y_1=1^{p-p'}$ where $p$ is the $k+1$ prime. According to the statement you mentioned, $K(y_1)\le c$ ($n=1$), for some constant $c$ depending only on $L$ (see paper).

Since this holds for all $x$, we can bound the Kolmogorov complexity of all elements in $S=\left\{y_1^x| x=1^p \text{ for prime $p$ } \land \text{$y_1^x$ is the first string in $L_x$}\right\}$ by the same constant $c$. However, we saw that $S$ actually consists of differences between consecutive primes, i.e. $S=\left\{1^{p_{k+1}-p_k} | k\ge 1\right\}$ where $p_k$ is the $k$'th prime. Since we know $S$ cannot have bounded Kolmogorov complexity (prime differences get arbitrarily large), this means $L$ cannot be regular.

Ariel
  • 13,614
  • 1
  • 22
  • 39
4

Another very easy example is the following: use Kolmogorov complexity to prove that $L_{ww} = \{ww \mid w \in \{0,1\}^* \}$ is not regular.

I give you a very informal proof hoping that it can help you better understand the role of Kolmogorov complexity.

The key idea is the following: a finite automata $D$ (that recognizes a regular language $L_D$) has a finite amount of "memory"; so running on an input string $x = yz$ when it has "processed" the first part of the input $y$ the membership of $x$ in $L_D$ depends only upon its current state and the second part of the input $z$.

Now suppose that $L_{ww}$ is regular; then there is a DFA $D_{ww}$ that recognizes it.

Let $y$ be an incompressible string of length $|y| = n \gg |D|$

For all inputs $x=y z$, at the end of the first part $y$, the DFA $D_{ww}$ will clearly be on the same state $q_i$, and by hypothesis it will accept only if the remaining part $z$ is such that $x = y z$ can be split in two equal halves (i.e. $yz = ww$); for example

 Let y = 10110
       y   z
 x = 10110 0  >> rejected
 x = 10110 1  >> accepted  (w=101, |y|>|z|)
 x = 10110 00 >> rejected
 x = 10110 01 >> rejected
 ....
 x = 10110 10110 >> accepted  (w=10110,  |y|=|z| !!!)
 ....
 x = 10110 1101101 >> accepted (w=101101, |z|<|y|

But it is important to notice that there is only one string $z$ of length $|y|$ that is accepted ($z = y$).

So given the description of $D_{ww}$, the state $q_i$ at the end of $y$, and the length $|y|$ we can simulate the behaviour of $D_{ww}$ on all the $2^{|y|}$ strings and see which of them it accepts ... but it accepts exactly $z=y$.

So with a program of size $\ell = |D_{ww}| + \log{i} + \log y + c $

($|D_{ww}|$ space is needed to store the description of $D_{ww}$, $\log i$ space to store $q_i$, $\log y$ space to store the length of $y$, $c$ space is needed for the instructions that simulates the DFA)

we can "reconstruct" the string $y$; but for large enough $y$ we have $\ell < |y|$ which is a contradiction because $y$ is incompressible.

Vor
  • 12,743
  • 1
  • 31
  • 62