3

I tried to find a simple example for a language that is not parseable with an LL(1) parser. I finally found this language.

$$L=\{\,a^nb^m\mid n,m\in\mathbb N,\>n\ge m\,\}$$

Is my hypothesis true or is this language parseable with an LL(1) parser?

One can use this simple grammar to describe $L$ (of course it is isn't LL(1) parseable):

S -> ε
S -> A
A -> aA
A -> aAb
A -> a
A -> ab
fuz
  • 913
  • 6
  • 20

3 Answers3

5

Kurki-Suonio has shown some helpful properties [1]:

Theorem 1
Each LL(k) grammar is unambiguous.

That means any inherently ambiguous language is not LL(1).

Theorem 9
For any $k>1$ the grammar $G$:

$\qquad \begin{align} S &\to aSA \mid \varepsilon \\ A &\to a^{k - 1} b S \mid c \end{align}$

generates an LL(k) language which is no LL(k-1) language.

There you have another concrete example by setting $k=2$.


As for your language $L$, assume $L$ is LL(1) and consider the language

$\qquad \displaystyle L' = \{ a^nb^m \mid n < m \land m > 0 \}$.

We make the following observations:

  • $L'$ is LL(1) by the grammar

    $\qquad \begin{align} S &\to AbB \\ A &\to aAb \mid \varepsilon \\ B &\to bB \mid \varepsilon \end{align}$

  • Neither $L$ nor $L'$ is regular (by Pumping Lemma).
  • $L \cap L' = \emptyset$.
  • $L \cup L' = \{a^nb^m \mid n,m \in \mathbb{N}\} \in \mathsf{REG}$.

In unison, these facts contradict this theorem [2]:

Theorem 9
If the finite union of disjoint LL(k) languages is regular, then all the languages are regular.

Thus, $L$ can not be LL(1) (and in fact not LL(k) for any k).


  1. Notes on top-down languages by R. Kurki-Suonio (1969)
  2. Properties of deterministic top-down grammars by D.J. Rosenkrantz and R.E. Stearns (1970)
Anton Trunov
  • 3,499
  • 1
  • 19
  • 26
Raphael
  • 73,212
  • 30
  • 182
  • 400
3

Your language correspond to the famous dangling else problem and it is well known that no $LL(k)$ grammar is able to parse it. The reason is that a $LL(k)$ grammar should be able to decide if a $a$ is paired with a $b$ when the $a$ is seen and the next $k$ symbols may be $a$.

Note that to make it $LR(1)$ you can't use a grammar like

$\qquad\begin{align} S &\to aSb \\ &\to aS \\ &\to \varepsilon \end{align}$

which is ambiguous. You have to use to

$\qquad\begin{align} S &\to aS \\ &\to R \\ R &\to aRb \\ &\to \varepsilon \end{align}$

But some parser generators like yacc are able to cope with that ambiguity, but the input has to be correctly ordered for them to work. Similarly, if you left factorize the first grammar, you get Raphael's grammar

$\qquad \begin{align} S &\to aST \mid \varepsilon \\ T &\to b \mid \varepsilon \end{align}$

which has an ambiguity which doesn't prevent to generate tables for LL(1) parsers.

AProgrammer
  • 3,099
  • 18
  • 20
2

Your proposed solution does not correspond to the defined language. Take the example "ab" which is a member of $L$, but not your grammar.

To be a LL(1) grammar, the production rules must not contain left recursions and has to be unambiguous. The following grammar satisfies does not satisfy those criteria:

S → A
A → aAb
A → B
B → aB
B → ԑ

I have one redundant production rule. Let's remove it:

S → aSb
S → A
A → aA
A → ԑ

Update: Based on your hint, I am changing my mind. I think your hypothesis is true. I don't know how to prove correctness for this (probably 1 or 2 might help), but my rationale goes like this: If you parse an token "a" at position $p$ you can (in no way) determine which string it belongs to (either $i_1: 0 \leq p < m$ or $i_2: m \leq p < n$). But this is a required property for context-free languages, because you have to represent $i_1$ using a recursion with one literal on both sides and $i_2$ with a right recursion. This requires two different production rules and therefore a distinction. But because $i_1$ and $i_2$ refers to the same character, this also introduces an ambiguity which excludes the set of LL(1)-parseable grammars. Ergo, it's possible in context-free languages. It's impossible in LL(1).

meisterluk
  • 201
  • 1
  • 4