Rust library to easily generate (Recursive Descent) Parser using macros.
This can be used to generate parsers for any language easily such as programming, markup, etc. All the parser generation logic (+regex building) is executed at compile time, ensuring zero runtime overhead.
While reading Crafting Interpreters and implementing my own programming language in Rust, I explored ways to make parser creation more efficient as outlined in the book. Writing parsers can be quite repetitive, especially for verbose languages, so automating parts of this process can save significant effort.
In the book, the author uses Java and demonstrates how to write code that generates Java parser code: a form of metaprogramming. Since I’m working with Rust, I’ve opted to leverage Rust’s powerful macro as well as type system to achieve a similar result. To that end, I’ve developed a library that uses macros to generate parsers.
However, I didn’t stop at replicating the simple approach of the book. And went for a solution to generate automatically the Scanner, Abstract-Syntax-Tree, and Parser logic.
Usage (Read the documentation for more information)
- Add the dependency to your
Cargo.tomlfile:
[dependencies]
autoparser = { git = "https://github.com/Swiiz/autoparser" }- Define your tokens, this will generate a
Scannerstruct and aTokenenum.
autoparser::impl_scanner! {
Whitespace @regex => "^(?<__>\\s)", // All regexes need to have a named capture group.
Minus => "-",
Plus => "+",
NumberLiteral { number: u32 } @regex => "^(?<number>(\\d)+)", // Named capture group can be used as data in Token.
}- Define your grammar, each rules will define struct representing a node in the Abstract Syntax Tree. Rules are declared in Order of precedence.
autoparser::impl_rules! {
// For performance reason you don't want to parse (.., <Token>, <Rule, ..) as the Rule. The parser can stop early at the "type level", you may either use only Rules or Token in the pattern.
AddOperator => Token::Plus,
SubOperator => Token::Minus,
// When not in enum mode, On the right side of the `=>` you can use any match pattern. The type of the provided pattern will be parsed.
// Then your pattern will be tested. You can use @, if and more... for data manipulation, see rust match-pattern docs.
Literal { number: u32 } => Token::NumberLiteral { number },
// Rules can also be an union of rules using the `enum` keyword and the `|` operator.
enum Expr => AddOperation | SubOperation | Unary,
// Rules can be recursive, however you need to use the Box<T> type to avoid infinite ast node size.
AddOperation { left: Unary, right: Box<Expr> } => (left, AddOperator {}, right),
SubOperation { left: Unary, right: Box<Expr> } => (left, SubOperator {}, right),
enum Unary => NegateOperation | Literal,
NegateOperation { literal: Literal } => (SubOperator {}, literal),
Literal { number: u32 } => Token::NumberLiteral { number },
}The Token enum, Scanner struct and each AST Node can now be used together:
let source = autoparser::Source {
name: None,
content: "1 + 2 - 3".into(),
};
let scanner = Scanner::new();
let scan = scanner
.scan(source)
.into_iter()
.filter(|t| t != &Token::Whitespace)
.collect::<Vec<_>>();
let mut tokens = autoparser::TokenStream::new(&scan);
println!("{#?}", Expr::try_parse(&mut tokens));-
Supporting -, +, *, /, parenthesis, variables and operator priority in <100 LOC.
Run the example:
cargo run --example calculatorSee the generated code documentation:
cargo doc --example calculator --no-deps --open -
Supporting table and arrays (with iterators), int, bool and string values in <100 LOC. Using Vec<T> and Option<T> utils to generate a compact AST.
Run the example:
cargo run --example jsonSee the generated code documentation:
cargo doc --example json --no-deps --open
-
impl_scanner!()
- The scanner scans the source code for the token strings and regexes. The scanner will always match static strings before trying to match regexes. Return a Vec of tokens.
- Each token is defined as a variant on the Token enum.
- trait
Parse<Self>is implemented for the newly created Token enum.
-
impl_rules!()
- Generates a struct/enum for each rule with it's data. (representing the ast node)
- trait
Parse<Token>is implemented for each node struct/enum, it first matches the rule, then the data. - Composite rules can be defined using tuples, such as (A, B, C, ...). Unlike enums, composite rules require all child rules to match for the composite rule to succeed.
Note
This is a work in progress. Error reporting needs to be improved.
Feel free to open an issue or contribute!