How can I show that every context-free language over a unary alphabet is regular?
4 Answers
The following proof follows Pighizzini, Shallit and Wang, Unary Context-Free Grammars and Pushdown Automata, Descriptional Complexity and Auxiliary Space Lower Bounds.
Let $L$ be a unary context-free language. Assume for simplicity that $\epsilon \notin L$, and consider a grammar $G = \langle V,\{a\},P,S \rangle$ for $L$ in Chomsky normal form. Denote $h = |V|$.
In the sequel, whenever we say parse tree, we mean parse tree in $G$.
Let $\Pi$ be the collection of all triples $(U,i,j)$ such that $U$ is a parse tree rooted at some nonterminal $A$ which represents a derivation of the sentential form $a^i A a^j$, where $0 < i+j < 2^h$.
Let $(U,i,j) \in \Pi$, and let $A$ be the label of the root of $U$. Given a parse tree $S$ containing a node $v$ labeled $A$, we can "pump" $S$ by $U$ by replacing $v$ with a copy of $U$, attaching the children of $v$ to the leaf of $U$ labeled $A$.
Lemma. If $\ell > 2^{h-1}$ and $a^\ell \in L$ and $T$ is a parse tree for $a^\ell$, then there exists a triple $(U,i,j) \in \Pi$ and a parse tree $S$ for $a^{\ell-i-j}$ such that $T$ is obtained from $S$ by pumping by $U$.
Proof. Since $\ell > 2^{h-1}$, $T$ must have depth at least $h+1$ (recall that the last level corresponds to productions of the form $A \to a$), and so a path of length $h+1$ edges. This path contains $h+1$ nonterminals, one of which must repeat. Consider such a repetition within the last $h+1$ nonterminals of the path. The repetition corresponds to a triple $(U,i,j) \in \Pi$ (note $i+j < 2^h$ since we chose a repetition within the last $h+1$ nonterminals, and $i+j > 0$ since the grammar is in Chomsky normal form). By "pumping out" this derivation, we obtain the parse tree $S$ for $a^{\ell-i-j}$. $\quad\square$
Corollary. Every parse tree in $L$ can be obtained in the following way:
- Start with a parse tree for some $a^\ell$, where $\ell \leq 2^{h-1}$.
- Repeatedly pump by $U$ for some $(U,i,j) \in \Pi$.
Using this, we can construct an NFA for $L$:
- Guess a parse tree for some $a^\ell$, where $\ell \leq 2^{h-1}$, and read the word $a^\ell$.
- Set $X$ to be the set of nonterminals appearing in the parse tree.
- Perform the following operation an arbitrary number of times:
- Guess $(U,i,j) \in \Pi$ such that the label of the root of $U$ appears in $X$.
- Read $a^{i+j}$.
- Add all nonterminals in $U$ to $X$.
This shows that $L$ is regular.
Parikh's theorem can be proved in the same way.
- 280,205
- 27
- 317
- 514
This follows easily from Parikh's theorem, but there is also a relatively short proof using the pumping lemma (which is easier to prove than Parikh's theorem).
- 198
- 1
- 5
This was my first attempt at it:
First let $L$ be our context free language. Using the pumping lemma for context free grammars: the pumping constant is $p$ and $m \ge p$.
We have a string $s = 1^m = uvwxy$, and we say $a_m = |uwy|$ and $b_m = |vx|$ such that $s = 1^{a_m}1^{b_m}$ where $1 \le b_m \le p$
Because our language was context free, we can say this about a string. Now let us define two more languages:
$$Mod = \{m \in \mathbb{N} | 1^m \in L\}$$
$$ L' = \{x \in L| |x| < p\}$$
Now we can construct out language $L$ from of finite union of regular languages, meaning it is regular:
$$ L = L' \cup \bigcup_{m \in Mod} 1^{a_m}1^{b_m} = L' \cup \bigcup_{m \in Mod} 1^{a_m}(1^{b_m})^* $$
- 498
- 5
- 8