How can I disable all BNFC built-in rules, like Ident, Integer or the spaces being used to separate tokens?
I found them useless and annoying since they interfere with the parsers I'm trying to write.
I already tried to re-define them but it seems like the lexer continues to generate the rules for them. I could manually delete them from the generated files but I'm completely against modifying machine generated code.
Long version on why they are annoying.
I'm just starting to learn how to use BNFC. The first thing I tried is to convert a previous work of mine from Alex to BNFC. In particular I want to match only "good" roman numerals. I thought it would be quite simple: A roman numeral can be seen as a sequence like
<thousand-part> <hundred-part> <tens-part> <unit-part>
Where they cannot all be empty. So a numeral either has a non-empty thousand-part and can be whatever in the rest, or it has an empty thousand-part and thus either hundred- or tens- or unit- part must be non empty. The same thing can be iterated until the base case of units.
So I came up with this, which is more or less a direct translation of what I did in Alex:
N1. Numeral ::= TokThousands HundredNumber ;
N2. Numeral ::= HundredNumberNE ; --NE = Not Empty
N3. HundredNumber ::= ;
N4. HundredNumber ::= HundredNumberNE ;
N5. HundredNumberNE ::= TokHundreds TensNumber ;
N6. HundredNumberNE ::= TensNumberNE ;
N7. TensNumber ::= ;
N8. TensNumber ::= TensNumberNE ;
N9. TensNumberNE ::= TokTens UnitNumber ;
N10. TensNumberNE ::= UnitNumberNE ;
N11. UnitNumber ::= ;
N12. UnitNumber ::= UnitNumberNE ;
N13. UnitNumberNE ::= TokUnits ;
token TokThousands ({"MMM"} | {"MM"} | {"M"}) ; -- No x{m,n} in BNFC regexes?
token TokHundreds ({"CM"} | {"DCCC"} | {"DCC"} | {"DC"} | {"D"} | {"CD"} | {"CCC"} | {"CC"} | {"C"}) ;
token TokTens ({"IC"} | {"XC"} | {"LXXX"} | {"LXX"} | {"LX"} | {"LX"} | {"L"} | {"IL"} | {"XL"} | {"XXX"} | {"XX"} | {"X"}) ;
token TokUnits ({"IX"} | {"VIII"} | {"VII"} | {"VI"} | {"V"} | {"IV"} | {"III"} | {"II"} | {"I"}) ;
Now, the problem is that if I try to build this parser, when giving an input like:
MMI
Or in general a numeral that has more than one of the *-parts not empty, the parser gives an error because BNFC cannot match MMI with a single token and thus it uses the built-in Ident rule. Since the rule doesn't appear in the grammar it raises a parsing error, although the input string is perfectly fine by the grammar I defined, it's the bogus Ident rule that's in the way.
Note: I verified that if I separate the different parts with spaces I get the correct input, but later on I want to put spaces to separate whole numbers, not their tokens.