Why separate lexing and parsing?

Question

It's possible to parse a document using a single pass from a state machine. What is the benefit of having two passes, ie. having a lexer to convert text to tokens, and having a parser to test production rules on those tokens? Why not have a single pass that applies production rules directly to the text?

Martin Berger · Accepted Answer · 2015-03-01T10:02:14.250

You don't have to separate them. People combine them into scannerless parsers.

The key disadvantage of scannerless parsers appears to be that the resulting grammars are rather complicated -- more complicated than the corresponding combination of a regular expression doing lexing and a context-free grammar doing parsing on the token-stream. In particular, grammars for scannerless parsing tend towards ambiguity. It's easier to remove ambiguity for grammars working on a token-stream.

A pragmatic benefit of using a dedicated upfront lexing phase is that you don't couple the subsequent parser with lexical detail. This is useful during early programming language development, when the lexical and syntactic details are still changing frequently.

Why separate lexing and parsing?

1 Answers1