Peglib

A PEG (Parsing Expression Grammar) parser library for Java, inspired by cpp-peglib.

Features

Grammar-driven parsing - Define parsers using PEG syntax in strings
cpp-peglib compatible syntax - Familiar grammar format for cpp-peglib users
Dual tree output - CST (lossless) for formatting/linting, AST (optimized) for compilers
Inline Java actions - Embed Java code directly in grammar rules
Trivia preservation - Whitespace and comments captured for round-trip transformations
Advanced error recovery - Continue parsing after errors with Rust-style diagnostics
Packrat memoization - O(n) parsing complexity
Source code generation - Generate standalone parser Java files
Java 25 - Uses latest Java features (records, sealed interfaces, pattern matching)

Quick Start

Dependency

<dependency>
    <groupId>org.pragmatica-lite</groupId>
    <artifactId>peglib</artifactId>
    <version>0.1.9</version>
</dependency>

Requires pragmatica-lite:core for Result/Option types.

Basic Parsing

import org.pragmatica.peg.PegParser;

// Define grammar and create parser
var parser = PegParser.fromGrammar("""
    Number <- < [0-9]+ >
    %whitespace <- [ \\t]*
    """).unwrap();

// Parse to CST (lossless, preserves trivia)
var cst = parser.parseCst("123").unwrap();

// Parse to AST (optimized, no trivia)
var ast = parser.parseAst("123").unwrap();

Parsing with Actions

var calculator = PegParser.fromGrammar("""
    Expr   <- Term (('+' / '-') Term)*
    Term   <- Factor (('*' / '/') Factor)*
    Factor <- Number / '(' Expr ')'
    Number <- < [0-9]+ > { return sv.toInt(); }
    %whitespace <- [ ]*
    """).unwrap();

// Actions transform parsed content into semantic values
Integer result = (Integer) calculator.parse("3 + 5 * 2").unwrap();
// result = 13

Grammar Syntax

Peglib uses PEG syntax compatible with cpp-peglib:

Basic Operators

# Rule definition
RuleName <- Expression

# Sequence - match e1 then e2
e1 e2

# Ordered choice - try e1, if fails try e2
e1 / e2

# Quantifiers
e*          # Zero or more
e+          # One or more
e?          # Optional
e{3}        # Exactly 3 times
e{2,}       # At least 2 times
e{2,5}      # Between 2 and 5 times

# Lookahead predicates (don't consume input)
&e          # Positive lookahead - succeeds if e matches
!e          # Negative lookahead - succeeds if e doesn't match

# Cut - commits to current choice, prevents backtracking
^           # Cut operator
↑           # Cut operator (alternative syntax)

# Grouping
(e1 e2)     # Group expressions

# Terminals
'literal'   # String literal (single quotes)
"literal"   # String literal (double quotes)
[a-z]       # Character class
[^a-z]      # Negated character class
.           # Any character

Extensions

# Token boundary - captures matched text as $0
< e >

# Ignore semantic value
~e

# Case-insensitive matching
'text'i
[a-z]i

# Named capture and back-reference
$name<e>    # Capture as 'name'
$name       # Back-reference to captured 'name'

Directives

# Auto-skip whitespace between tokens
%whitespace <- [ \t\r\n]*

Inline Actions

Actions are Java code blocks that transform parsed content:

Number <- < [0-9]+ > { return sv.toInt(); }
Sum <- Number '+' Number { return (Integer)$1 + (Integer)$2; }
Word <- < [a-z]+ > { return $0.toUpperCase(); }

Action API

Inside action blocks, you have access to SemanticValues sv:

Access	Description
`sv.token()` or `$0`	Matched text (raw input)
`sv.get(0)` or `$1`	First child's semantic value
`sv.get(1)` or `$2`	Second child's semantic value
`sv.toInt()`	Parse matched text as integer
`sv.toDouble()`	Parse matched text as double
`sv.size()`	Number of child values
`sv.values()`	All child values as List

Note: $1, $2, etc. use 1-based indexing (like regex groups), while sv.get() uses 0-based.

Configuration

var parser = PegParser.builder(grammar)
    .packrat(true)                           // Enable memoization (default: true)
    .trivia(true)                            // Collect whitespace/comments (default: true)
    .recovery(RecoveryStrategy.ADVANCED)     // Error recovery mode
    .build()
    .unwrap();

Error Recovery

Peglib provides advanced error recovery with Rust-style diagnostic messages:

var parser = PegParser.builder(grammar)
    .recovery(RecoveryStrategy.ADVANCED)
    .build()
    .unwrap();

var result = parser.parseCstWithDiagnostics("abc, @@@, def");

if (result.hasErrors()) {
    System.out.println(result.formatDiagnostics("input.txt"));
}

Output:

error: unexpected input
  --> input.txt:1:6
   |
 1 | abc, @@@, def
   |      ^ found '@'
   |
   = help: expected [a-z]+

Recovery Strategies

Strategy	Behavior
`NONE`	Fail immediately on first error
`BASIC`	Report error with context, stop parsing
`ADVANCED`	Continue parsing, collect all errors, insert Error nodes

See Error Recovery Documentation for details.

Trivia Handling

CST nodes preserve whitespace and comments as trivia:

var parser = PegParser.fromGrammar("""
    Expr <- Number '+' Number
    Number <- < [0-9]+ >
    %whitespace <- [ \\t]+
    """).unwrap();

var cst = parser.parseCst("  42 + 7  ").unwrap();

// Access trivia
List<Trivia> leading = cst.leadingTrivia();   // "  " before 42
List<Trivia> trailing = cst.trailingTrivia(); // "  " after 7

Trivia types:

Trivia.Whitespace - spaces, tabs, newlines
Trivia.LineComment - // ... style
Trivia.BlockComment - /* ... */ style

Source Code Generation

Generate standalone parser Java files for production use:

Result<String> source = PegParser.generateParser(
    grammarText,
    "com.example.parser",  // package name
    "JsonParser"           // class name
);

// Write to file
Files.writeString(Path.of("JsonParser.java"), source.unwrap());

Generated parsers:

Are self-contained single files
Only depend on pragmatica-lite:core
Include packrat memoization
Support trivia collection
Have type-safe RuleId for each grammar rule

Generated Parser with Advanced Diagnostics

Generate parsers with Rust-style error reporting:

import org.pragmatica.peg.generator.ErrorReporting;

// Generate CST parser with advanced diagnostics
Result<String> source = PegParser.generateCstParser(
    grammarText,
    "com.example.parser",
    "MyParser",
    ErrorReporting.ADVANCED  // Enable Rust-style diagnostics
);

ErrorReporting	Description
`BASIC`	Simple `ParseError(line, column, reason)` - minimal code
`ADVANCED`	Full diagnostics with source context, underlines, labels

When ADVANCED is enabled, the generated parser includes:

// Parse with diagnostics
var result = parser.parseWithDiagnostics(input);

if (result.hasErrors()) {
    // Format as Rust-style diagnostics
    System.err.println(result.formatDiagnostics("input.txt"));
}

// Access individual diagnostics
for (var diag : result.diagnostics()) {
    System.out.println(diag.formatSimple()); // file:line:col: severity: message
}

Output example:

error: expected Number
  --> input.txt:1:5
   |
 1 | 3 + @invalid
   |     ^ found '@'
   |

Examples

See the examples directory:

Example	Description
CalculatorExample	Arithmetic with semantic actions
JsonParserExample	JSON CST parsing
SExpressionExample	Lisp-like syntax
CsvParserExample	CSV data format
ErrorRecoveryExample	Error recovery patterns
SourceGenerationExample	Standalone parser generation
Java25GrammarExample	Java 25 syntax parsing

CST Node Types

public sealed interface CstNode {
    record Terminal(...)    // Leaf node with text
    record NonTerminal(...) // Interior node with children
    record Token(...)       // Result of < > operator
    record Error(...)       // Unparseable region (error recovery)
}

Building

mvn compile    # Compile
mvn test       # Run tests (308 tests)
mvn verify     # Full verification

Requires Java 25+.

References

cpp-peglib - C++ PEG library (inspiration)
PEG Paper - Bryan Ford's original paper
Packrat Parsing - Memoization technique

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
.github		.github
docs		docs
scripts		scripts
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
jbct.toml		jbct.toml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Peglib

Features

Quick Start

Dependency

Basic Parsing

Parsing with Actions

Grammar Syntax

Basic Operators

Extensions

Directives

Inline Actions

Action API

Configuration

Error Recovery

Recovery Strategies

Trivia Handling

Source Code Generation

Generated Parser with Advanced Diagnostics

Examples

CST Node Types

Building

References

License

About

Uh oh!

Releases 3

Sponsor this project

Uh oh!

Packages

Contributors 2

Uh oh!

Languages

Uh oh!

License

siy/java-peglib

Folders and files

Latest commit

History

Repository files navigation

Peglib

Features

Quick Start

Dependency

Basic Parsing

Parsing with Actions

Grammar Syntax

Basic Operators

Extensions

Directives

Inline Actions

Action API

Configuration

Error Recovery

Recovery Strategies

Trivia Handling

Source Code Generation

Generated Parser with Advanced Diagnostics

Examples

CST Node Types

Building

References

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Sponsor this project

Uh oh!

Packages 0

Contributors 2

Uh oh!

Languages

Packages