-
Notifications
You must be signed in to change notification settings - Fork 0
Grammar File Syntax
Grammar (.gram) files define a PEG like syntax for declaring parsers. It includes some additions to aid in linking your parser to C++ code.
A Grammar file consists of a finite number of configuration options and parse rules.
All configuration options follow the pattern @name = 'value', these options can be used to control some behaviours of the code generator.
Special configurations:
-
header- code that is placed at the top of the generated source file, include other code you need in here. -
footer- code that is placed at the end of the generated source file. -
class_name- the name of the generated parser class, default:CustomParser -
inherits_from- the class your generated parser inherits from, change this if you want to add custom functionality to the Parser class, default:Parser -
disable_left_recursion- set this to any non-empty value to disable left recursion handling and memoization caches.
A parse rule consists of a name, return type, and set of statements
rule_name[return_type] <statements>
Statements allow you to write detailed grammar expressions.
A token name must be only A-Z, 0-9 and _ characters.
A ID referring to another statement must be only a-z, 0-9 and _ characters.
-
A- Match tokenA -
e- Match rulee -
e*- Match the sub-expressionezero or more times, matches as many as possible -
e+-eone or more times -
e?-ezero or one times, gives anoptionalback. -
e | e2- NOT IMPLEMENTED -
(a b)- grouped expression, returns tuple -
v=e- assign result of sub-expressioneto variablev -
&e- and predicate, invoke sub-expressioneand then succeeds ifesucceeds and fails if e fails, but in either case never consumes any input. -
!e- not predicate, succeeds ifefails and fails ifesucceeds, again consuming no input in either case.
stmt[int]
: e e2 { 1 };
: e2 e { 2 };
Statements are matched from top to bottom, returning on the first successful one (if any). The statement actions control what each statement returns, although each value must be able to be converted to the ules return type. actions are put into the source code as return action;, make sure you write valid c++ code.
stmt[int]
: A B { 1 };
stmt return an integer. it has one statement, which matches the token A then the token B, returning 1 on success.
stmt[int]
: x=A { x.value.length() };
The A token is assigned to x, you can refer to x in your action, in this case returning the length of the tokens value.
Note: x is of type Token as A is a token
This grammar definition recognises the basic mathematical operations.
start[int]
: e=expr EOF { e };
expr[int]
: left=expr ADD right=term { left + right };
: left=expr SUB right=term { left - right };
: e=term { e };
term[int]
: left=term MUL right=factor { left * right };
: left=term DIV right=factor { left / right };
: e=factor { e };
factor[int]
: left=item POW right=factor { pow(left, right) };
: e=item { e };
item[int]
: n=INT { std::stoi(n.value) };
: LPAREN e=expr RPAREN { e };
All token definitions are in examples/calc/calc_lexer.hpp and the entire grammar file in examples/calc/calc.gram. Using this grammar it is possible to input simple mathematical expressions like 2 * (3 + 4).