1

I'm wondering if there exists an algorithm to solve the following problem:

Given a grammar $S$ of a context-free language $\mathcal{L}$, find a grammar $S'$ such as $L(S) = L(S')^c $.

I note that the complement of a context-free language is also a context-free language, so the questions is well stated.

Jay Jay
  • 69
  • 5

2 Answers2

6

Contrary to what you wrote, context-free languages are not closed under complement. See Examples of context-free languages with a non-context-free complements for some examples. As a result, there is no such algorithm.

Also: It's decidable whether $L(S)=\emptyset$, but it's not decidable whether $L(S)=\Sigma^*$. If you could compute the complement, then you could decide the latter using the algorithm for the former.

In contrast, deterministic context-free languages are closed under complement, and the proof of that fact likely identifies an algorithm that will work when you have a deterministic context-free language. Another way to handle it: convert to a deterministic pushdown automaton, then it is clear how to complement it.

D.W.
  • 167,959
  • 22
  • 232
  • 500
1

Since you didn't explicitly specify that $S'$ should be a context-free grammar, I'll take the opportunity to mention Boolean grammars, which are a fairly modest extension of CFGs that allow conjunction and negation in rules, in addition to the implicit disjunction of CFGs. The productions have the form

$$A \to \alpha_1 \And \ldots \And \alpha_m \And \lnot\beta_1 \And \ldots \And \lnot\beta_n$$ where $A$ is a nonterminal, $m+n \ge 1$ and $\alpha_1$, ..., $\alpha_m$, $\beta_1, \ldots, \beta_n$ are strings formed of symbols in $\Sigma$ and $N$. Informally, such a rule asserts that every string $w$ over $\Sigma$ that satisfies each of the syntactical conditions represented by $\alpha_1$, ..., $\alpha_m$ and none of the syntactical conditions represented by $\beta_1$, ..., $\beta_n$ therefore satisfies the condition defined by $A$.

Then if $G$ is a Boolean grammar (in particular, it could be a CFG) with start symbol $S$, add a new start symbol $S'$ and add the rule $S' \rightarrow \lnot S$ to obtain a grammar for $L(G)^c$.

Boolean grammars are a "modest" extension in that the deterministic time complexity of parsing is the same as for CFGs and they can be defined by language equations, but on the other hand they recognize a much larger class of languages. For example, even conjunctive grammars do not satisfy Parikh's theorem (i.e., the conjunctive grammars over an unary alphabet recognize non-regular languages). It is an open problem whether conjunctive grammars are closed under complementation.

Max
  • 325
  • 1
  • 8