Why LL(1) grammar generate all regular languages?

Question

I came across following:

Every regular language has right linear grammar and this is LL(1). Thus, LL(1) grammar generates all regular languages.

I tried to get that.

Definition: Right linear grammar (RLG)
In right linear grammar, all productions are of one of the following forms: $A\rightarrow t^*V$ or $A\rightarrow t^*$, where $A$ and $V$ are non terminals and $t$ is terminal.

Definition: LL(1) grammar

A grammar $G$ is $LL(1)$ grammar if and only if whenever $A→α|β$ are two distinct productions of $G$, the following conditions hold:

For no terminal $a$ do both $α$ and $β$ derive strings beginning with $a$.
At most one of $α$ and $β$ can derive the empty string.
If $β⇒^*ϵ$, then $α$ does not derive any string beginning with a terminal FOLLOW(A). Likewise, if $α⇒^*ϵ$, then $β$ does not derive any string beginning with a terminal in FOLLOW(A). ($β⇒^*ϵ$ means $B$ derives $\epsilon$)

(Q1.) How definition of RLG ensures condition 1 in the definition of LL(1) grammar.

This answer says:

All regular languages have LL(1) grammars. To obtain such a grammar, take any DFA for the regular language (perhaps by doing the subset construction on the NFA obtained from the regular expression), then convert it to a right-recursive regular grammar. This grammar is then LL(1), because any pair of productions for the same nonterminal either start with different symbols, or one produces ε and has $ as a lookahead token.

(Q2.) I read somewhere "eliminating left recursion from given grammar does not necessarily make it LL(1)". Then how turning grammar to right recursive will ensure its LL(1) (as stated in above quoted answer)?

(Q3). I didnt get the significance of "one produces ε and has $ as a lookahead token" in above quoted answer.

(Q4.) First quote in this question says right linear grammar is LL(1). How is it so?

(Q5.) This answer says "all regular languages have a LR(0) grammar", I guess its incorrect as LR(0) are DCFLs with prefix property which are not superset of regular languages. Am I right with this?

score 5 · Accepted Answer · edited Jun 16 '20 at 10:30

The quoted answer does not claim that every right regular grammar is LL(1). That statement would not be true.

What the answer claims is that the grammars produced by the indicated algorithm are LL(1). That statement is correct.

So, no-one is saying "every right-linear grammar" satisfies condition 1 (of the LL(1) definition) (your Q1). They don't all do so.

Alos, no-one is saying that just removing left-recursion is sufficient to guarantee LL(1) (your Q2). It isn't.

Finally, no-one is saying that every RLG is LL(1) (your Q4), not even the unattributed quote which starts your question. That quote says that every regular language has at least one RLG which is LL(1). Regular languages have many RLGs, and often not all of them are LL(1). These other RLGs are not relevant to the claim that all regular languages are LL(1).

That leaves your Q3, which is really about how to demonstrate the RLG produced by the algorithm satisfies conditions 2 and 3.

It's clear why that particular RLG passes condition 1. The algorithm starts with a DFA, and the DFA has only one out-transition on each symbol, by definition. One production is generated for each out-transition, whose first symbol is the out-transition's symbol. So it's not possible for two productions starting with the same symbol to be generated for the same state (= non-terminal).

Now, under what circumstances does the algorithm produce productions which derive $\epsilon$? Answer: these productions are generated for final states. @templatetypedef wrote the quoted answer thinking about the augmented grammar/language, in which every sentence ends with an end-marker $ which does not appear elsewhere. [Note 1]

Instead of augmenting the grammar or writing the LRG as implied by the cited answer, we could invent a new state F with no out-transitions on any symbol. We then add the production $A\to F$ to every final state $A$. And we add the production $F\to\epsilon$.

Now condition 2 is met becaus the only unit productions in the grammar are $A\to F$, so no non-terminal derives $\epsilon$ in any way other than directly through $F$. And condition 3 is met rather trivially because $FOLLOW$ of every non-terminal is empty, so no terminal is in any $FOLLOW$ set.

Since conditions 1, 2 and 3 are verified, we know that the grammar produced by converting a DFA is necessarily LL(1).

Notes

This augmented language is prefix-free, since all sentences end with the endmarker. Also, as we'll see, the augmented grammar produced by the algorithm is LL(1) and therefore deterministic. A deterministic prefix-free language is LR(0), which might explain the error you refer to in your Q5. The original unaugmented grammar is not necessarily prefix-free, and therefore might not be LR(0) although it is still LL(1). This is an illustration of why augmented grammars are useful.

reinierpost · Answer 2 · 2021-03-28T08:59:12.267

Let's consider a restricted type of right-regular grammar: all rules must be of the form

$X \rightarrow yZ$
$X \rightarrow \epsilon$

Such a grammar directly corresponds to a nondeterministic automaton:

the nonterminals correspond to states
the start symbol corresponds to the initial state
the $\epsilon$-rules corresponds to the state being accepting
the other rules corresponds to transitions
the grammar generating a word corresponds to the automaton accepting that word

It is a special kind of nondeterministic automaton: one in which all transitions are on single symbols; no $\epsilon$ transitions.

It is a 1 to 1 mapping: every such automaton corresponds to such a grammar.

Such a grammar is LL(1) if and only if its corresponding automaton is deterministic. Once again, the correspondence is 1 to 1: every deterministic automaton corresponds to such a grammar.

We know deterministic automata can accept all regular languages. Hence, LL(1) right-linear grammars (of this restricted type) can generate all regular languages.

Why LL(1) grammar generate all regular languages?

2 Answers2

Notes