NatGI infers context-free grammar from example programs. It has only black‑box access to the language parser (an oracle) during the learning process. NatGI follows the parse tree recovery principle from Arvada/TreeVada for grammar inference. NatGI's tree building technique is more powerful with GPT-4o and it produces human‑readable grammars with semantically meaningful nonterminal names.
| Ground Truth while Grammar | NatGIBase Inferred while Grammar |
|---|---|
grammar whileLang;
start
: stmt
;
stmt
: stmt ';' stmt
| 'skip'
;
boolexpr
: '~' boolexpr
| boolexpr '&' boolexpr
| numexpr '==' numexpr
| 'true'
| 'false'
;
numexpr
: '(' numexpr '+' numexpr ')'
| 'L'
| 'n'
; |
grammar whileLang;
start
: stmt
;
stmt
: stmt ';' stmt
| 'L' '=' numexpr
| 'L' '=' 'L'
| 'while' boolexpr 'do' stmt
| 'if' boolexpr 'then' stmt 'else' stmt
| 'skip'
;
boolexpr
: '~' boolexpr
| boolexpr '&' boolexpr
| numexpr '==' numexpr
| 'L' '==' numexpr
| numexpr '==' 'L'
| 'L' '==' 'L'
| 'true'
| 'false'
;
numexpr
: '(' 'L' '+' 'L' ')'
| '(' 'L' '+' numexpr ')'
| '(' numexpr '+' 'L' ')'
| '(' numexpr '+' numexpr ')'
| 'n'
; |
Key features
- Needs few example programs and an oracle, the oracle is a command that accepts a filename and returns 0 for valid inputs.
- Relies on LLM to build parse-trees of the example programs.
- Produces human-readable grammars with meaningful non-terminals.
- Generates grammar in ANTLR4 format, which is directly compatible with popular grammar-based fuzzers (i.e. Grammarinator)
NatGI is built on Arvada/TreeVada's approach for grammar inference and aims to make inferred grammars easier to inspect and use in downstream tooling.
Requires at least python 3.10. Install the following two packages via pip to make sure everything runs:
$ pip3 install lark-parser tqdm openai PrettyPrintTree
(Optional) for using python target of antlr4 parser
pip3 install antlr4-python3-runtime==4.9.2
NatGI takes a directory of example programs (TRAIN_DIR) and an oracle command (ORACLE_CMD). The oracle must be invocable as:
ORACLE_CMD filename— runs the oracle on the filefilename- return 0 for valid examples, non‑zero for invalid examples
Search.py is the entry point to the tool. To learn a grammar run:
$ python3 search.py ORACLE_CMD TRAIN_DIR LOG_FILE
This writes training details to LOG_FILE and saves the learned grammar as a pickled dictionary in LOG_FILE.gramdict. For more configurable options, run python3 search.py --help.
If you also have a held-out test set in TEST_DIR, you can evaluate the precision and recall of the mined grammar with the utility eval.py. This utility also handily prints out the unpickled grammar to LOG_FILE.eval file. The provided LOG_FILE must match one generated by search.py, as this utility looks for LOG_FILE.gramdict.
$ python3 eval.py [--no-antlr4] [-n PRECISION_SET_SIZE] ORACLE_CMD TEST_DIR LOG_FILE
PRECISION_SET_SIZE(optional) is the number of samples to draw from the learned grammar to estimate precision (default: 1000).TEST_DIRis a directory of held-out valid programs.--no-antlr4flag doesn't generate antlr4 grammar. Otherwise antlr4 grammar is written into a.g4file.
Compare how NatGI performs against zero shot grammar inference, even with powerful reasoning models like GPT-o4.
- Run
gpt.pyto get a text grammar.
python3 gpt.py SEED_DIR SEED_NAME #python3 gpt.py Seed_Programs/tinyc/tinyc-train tinyc-r1
This writes a text grammar to results/gpt_grammar_{SEED_NAME}.txt.
- Convert the text grammar into the pickled
.gramdictformat (uses ORACLE_CMD for lexical inference on the terminals):
python3 grammar_from_csv.py SEED_NAME ORACLE_CMD #python3 grammar_from_csv.py tinyc-r1 Seed_Programs/tinyc/parse_tinyc
This produces a binary grammar file named results/gpt-grammar_{SEED_NAME}.gramdict.
- Evaluate the saved grammar (precision/recall/F1) with the eval utility:
python3 eval.py ORACLE_CMD TEST_DIR LOG_FILE #python3 eval.py Seed_Programs/tinyc/parse_tinyc Seed_Programs/tinyc/tinyc-test tinyc-r1
Here, LOG_FILE is the base name used for the .gramdict produced by step 2.
Note that grammar_from_csv.py expects the text grammar in a format instructed in the prompt. if the LLM cannot follow the specific format or the grammar has errors (i.e. some non-terminals used in rules but never defined), step 2 and 3 will fail.
