3

Given a CFG in Chomsky normal form, is there an algorithm that solves the emptiness problem in linear runtime? I thought about using depth search here, but I think it's a little bit above linear runtime.

Julian
  • 133
  • 4

1 Answers1

8

Yup, it can be done. For each nonterminal $A$, introduce a boolean variable $x_A$, with the intent that if $x_A$ is true, that means $L(A)$ is non-empty. Then you can convert each production into a corresponding Horn clause:

  • $A \to BC$ becomes $(x_B \land x_C) \implies x_A$
  • $A \to a$ becomes $x_A$
  • $S \to \varepsilon$ becomes $x_S$

Let $\varphi$ denote the conjunction of these Horn clauses. Find the minimal satisfying assignment for $\varphi$; that can be done in linear time. If this assignment makes $x_S$ true, then the language is non-empty, otherwise it is empty.


Alternatively, if you prefer a more direct algorithm, here is a standard one that you might see in textbooks.

Start out with all nonterminals unmarked. If you see a rule $A \to a$, mark $A$. If you see a rule $S \to \varepsilon$, mark $S$. Whenever you mark a nonterminal, check all rules of the form $A \to BC$ where it appears on the right-hand side; if both $B$ and $C$ are marked, mark $A$. Repeat until convergence. At that point, all marked nonterminals correspond to nonterminals that generate a non-empty language, so the language is non-empty iff $S$ is marked.

This also runs in linear time. It takes a little more work to see why, but it's true. In particular, each nonterminal can only be marked once, and each rule of the form $A \to BC$ will only be checked at most twice (once when $B$ is marked, once when $C$ is marked), so the amount of work you do is $O(1)$ per nonterminal plus $O(1)$ per rule, which is linear in the size of the grammar. It does require suitable data structures that map from each nonterminal to a list of all rules containing it on the right-hand side, but that can be built in advance in linear time as well.

D.W.
  • 167,959
  • 22
  • 232
  • 500