16

It's possible to parse a document using a single pass from a state machine. What is the benefit of having two passes, ie. having a lexer to convert text to tokens, and having a parser to test production rules on those tokens? Why not have a single pass that applies production rules directly to the text?

Raphael
  • 73,212
  • 30
  • 182
  • 400
Brent
  • 2,583
  • 3
  • 16
  • 23

1 Answers1

14

You don't have to separate them. People combine them into scannerless parsers.

The key disadvantage of scannerless parsers appears to be that the resulting grammars are rather complicated -- more complicated than the corresponding combination of a regular expression doing lexing and a context-free grammar doing parsing on the token-stream. In particular, grammars for scannerless parsing tend towards ambiguity. It's easier to remove ambiguity for grammars working on a token-stream.

A pragmatic benefit of using a dedicated upfront lexing phase is that you don't couple the subsequent parser with lexical detail. This is useful during early programming language development, when the lexical and syntactic details are still changing frequently.

Martin Berger
  • 8,358
  • 28
  • 47