1

For a CFG with the production rules that can represent a regular expression. How can one calculate all the set of strings that regular expression would produce.

For T = {a, b,*,(,)} and an arbitrary production rule I created to represent a simple regular expression

S->SS | S* | (S) | a | b | ∅

What would be an attribute grammar which can calculate the set of all strings that the regex can evaluate.

For example if we generate b(a)* Then the output generated by regular expression would be a set of infinte strings {b,baa,ba,baaa......} using set valued expression in the attribute grammar.

I am not sure how to represent set valued concatenation and Kleene closure.

2 Answers2

1

Any "interesting" regular expression (i.e., one that includes Kleene star) represents an infinite set. Thus to "calculate the set of all strings" won't work so well. In a sense, the regular expression is a compact description of the language. What other kind of description do you expect to construct?

It is rather easy to construct regular expressions that represent very complex languages, that the reader needs to descypher to make sense of it.

vonbrand
  • 14,204
  • 3
  • 42
  • 52
1

The semantic rules of the attribute grammar "compute" the value of the expression to the left in terms of the values of the expressions to the right. With a grammar for regular expressions as you propose this is straightforward, but precisely because of that this can be confusing.

For instance the meaning of the rule $S_0\to S_1 S_2$ is concatenation, and this operation is directly translated into that operation on languages: $S_0.\text{val} = (S_1.\text{val}) \cdot (S_2.\text{val})$

[I have indexed the variables of the rules for clarity.]

Similarly for the Kleene star operation, here the syntactic notation $*$ in the expression, and in the production rule, is translated into the semantic operation Kleene star on languages which has the intended meaning. Thus $S_0\to S_1^*$ with semantic rule $S_0.\text{val} = (S_1.\text{val})^*$

That's all.

Once however, in a compiler setting, I have seen a different approach. There the semantic value of the expression was not a language, but a finite state automaton (non-deterministic, with $\varepsilon$ rules). In that way the intended meaning of the expression is nicely captured in a computable fashion. The operations in the semantic rules are then the concatenation and star constructions on automata.

In that case, the operations you are looking for is Thompson's construction.

enter image description here

Hendrik Jan
  • 31,459
  • 1
  • 54
  • 109