Skip to content

siy/java-peglib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

103 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Peglib

A PEG (Parsing Expression Grammar) parser library for Java, inspired by cpp-peglib.

Features

  • Grammar-driven parsing - Define parsers using PEG syntax in strings
  • cpp-peglib compatible syntax - Familiar grammar format for cpp-peglib users
  • Dual tree output - CST (lossless) for formatting/linting, AST (optimized) for compilers
  • Inline Java actions - Embed Java code directly in grammar rules
  • Trivia preservation - Whitespace and comments captured for round-trip transformations
  • Advanced error recovery - Continue parsing after errors with Rust-style diagnostics
  • Packrat memoization - O(n) parsing complexity
  • Source code generation - Generate standalone parser Java files
  • Java 25 - Uses latest Java features (records, sealed interfaces, pattern matching)

Quick Start

Dependency

<dependency>
    <groupId>org.pragmatica-lite</groupId>
    <artifactId>peglib</artifactId>
    <version>0.1.9</version>
</dependency>

Requires pragmatica-lite:core for Result/Option types.

Basic Parsing

import org.pragmatica.peg.PegParser;

// Define grammar and create parser
var parser = PegParser.fromGrammar("""
    Number <- < [0-9]+ >
    %whitespace <- [ \\t]*
    """).unwrap();

// Parse to CST (lossless, preserves trivia)
var cst = parser.parseCst("123").unwrap();

// Parse to AST (optimized, no trivia)
var ast = parser.parseAst("123").unwrap();

Parsing with Actions

var calculator = PegParser.fromGrammar("""
    Expr   <- Term (('+' / '-') Term)*
    Term   <- Factor (('*' / '/') Factor)*
    Factor <- Number / '(' Expr ')'
    Number <- < [0-9]+ > { return sv.toInt(); }
    %whitespace <- [ ]*
    """).unwrap();

// Actions transform parsed content into semantic values
Integer result = (Integer) calculator.parse("3 + 5 * 2").unwrap();
// result = 13

Grammar Syntax

Peglib uses PEG syntax compatible with cpp-peglib:

Basic Operators

# Rule definition
RuleName <- Expression

# Sequence - match e1 then e2
e1 e2

# Ordered choice - try e1, if fails try e2
e1 / e2

# Quantifiers
e*          # Zero or more
e+          # One or more
e?          # Optional
e{3}        # Exactly 3 times
e{2,}       # At least 2 times
e{2,5}      # Between 2 and 5 times

# Lookahead predicates (don't consume input)
&e          # Positive lookahead - succeeds if e matches
!e          # Negative lookahead - succeeds if e doesn't match

# Cut - commits to current choice, prevents backtracking
^           # Cut operator
↑           # Cut operator (alternative syntax)

# Grouping
(e1 e2)     # Group expressions

# Terminals
'literal'   # String literal (single quotes)
"literal"   # String literal (double quotes)
[a-z]       # Character class
[^a-z]      # Negated character class
.           # Any character

Extensions

# Token boundary - captures matched text as $0
< e >

# Ignore semantic value
~e

# Case-insensitive matching
'text'i
[a-z]i

# Named capture and back-reference
$name<e>    # Capture as 'name'
$name       # Back-reference to captured 'name'

Directives

# Auto-skip whitespace between tokens
%whitespace <- [ \t\r\n]*

Inline Actions

Actions are Java code blocks that transform parsed content:

Number <- < [0-9]+ > { return sv.toInt(); }
Sum <- Number '+' Number { return (Integer)$1 + (Integer)$2; }
Word <- < [a-z]+ > { return $0.toUpperCase(); }

Action API

Inside action blocks, you have access to SemanticValues sv:

Access Description
sv.token() or $0 Matched text (raw input)
sv.get(0) or $1 First child's semantic value
sv.get(1) or $2 Second child's semantic value
sv.toInt() Parse matched text as integer
sv.toDouble() Parse matched text as double
sv.size() Number of child values
sv.values() All child values as List

Note: $1, $2, etc. use 1-based indexing (like regex groups), while sv.get() uses 0-based.

Configuration

var parser = PegParser.builder(grammar)
    .packrat(true)                           // Enable memoization (default: true)
    .trivia(true)                            // Collect whitespace/comments (default: true)
    .recovery(RecoveryStrategy.ADVANCED)     // Error recovery mode
    .build()
    .unwrap();

Error Recovery

Peglib provides advanced error recovery with Rust-style diagnostic messages:

var parser = PegParser.builder(grammar)
    .recovery(RecoveryStrategy.ADVANCED)
    .build()
    .unwrap();

var result = parser.parseCstWithDiagnostics("abc, @@@, def");

if (result.hasErrors()) {
    System.out.println(result.formatDiagnostics("input.txt"));
}

Output:

error: unexpected input
  --> input.txt:1:6
   |
 1 | abc, @@@, def
   |      ^ found '@'
   |
   = help: expected [a-z]+

Recovery Strategies

Strategy Behavior
NONE Fail immediately on first error
BASIC Report error with context, stop parsing
ADVANCED Continue parsing, collect all errors, insert Error nodes

See Error Recovery Documentation for details.

Trivia Handling

CST nodes preserve whitespace and comments as trivia:

var parser = PegParser.fromGrammar("""
    Expr <- Number '+' Number
    Number <- < [0-9]+ >
    %whitespace <- [ \\t]+
    """).unwrap();

var cst = parser.parseCst("  42 + 7  ").unwrap();

// Access trivia
List<Trivia> leading = cst.leadingTrivia();   // "  " before 42
List<Trivia> trailing = cst.trailingTrivia(); // "  " after 7

Trivia types:

  • Trivia.Whitespace - spaces, tabs, newlines
  • Trivia.LineComment - // ... style
  • Trivia.BlockComment - /* ... */ style

Source Code Generation

Generate standalone parser Java files for production use:

Result<String> source = PegParser.generateParser(
    grammarText,
    "com.example.parser",  // package name
    "JsonParser"           // class name
);

// Write to file
Files.writeString(Path.of("JsonParser.java"), source.unwrap());

Generated parsers:

  • Are self-contained single files
  • Only depend on pragmatica-lite:core
  • Include packrat memoization
  • Support trivia collection
  • Have type-safe RuleId for each grammar rule

Generated Parser with Advanced Diagnostics

Generate parsers with Rust-style error reporting:

import org.pragmatica.peg.generator.ErrorReporting;

// Generate CST parser with advanced diagnostics
Result<String> source = PegParser.generateCstParser(
    grammarText,
    "com.example.parser",
    "MyParser",
    ErrorReporting.ADVANCED  // Enable Rust-style diagnostics
);
ErrorReporting Description
BASIC Simple ParseError(line, column, reason) - minimal code
ADVANCED Full diagnostics with source context, underlines, labels

When ADVANCED is enabled, the generated parser includes:

// Parse with diagnostics
var result = parser.parseWithDiagnostics(input);

if (result.hasErrors()) {
    // Format as Rust-style diagnostics
    System.err.println(result.formatDiagnostics("input.txt"));
}

// Access individual diagnostics
for (var diag : result.diagnostics()) {
    System.out.println(diag.formatSimple()); // file:line:col: severity: message
}

Output example:

error: expected Number
  --> input.txt:1:5
   |
 1 | 3 + @invalid
   |     ^ found '@'
   |

Examples

See the examples directory:

Example Description
CalculatorExample Arithmetic with semantic actions
JsonParserExample JSON CST parsing
SExpressionExample Lisp-like syntax
CsvParserExample CSV data format
ErrorRecoveryExample Error recovery patterns
SourceGenerationExample Standalone parser generation
Java25GrammarExample Java 25 syntax parsing

CST Node Types

public sealed interface CstNode {
    record Terminal(...)    // Leaf node with text
    record NonTerminal(...) // Interior node with children
    record Token(...)       // Result of < > operator
    record Error(...)       // Unparseable region (error recovery)
}

Building

mvn compile    # Compile
mvn test       # Run tests (308 tests)
mvn verify     # Full verification

Requires Java 25+.

References

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Packages

No packages published

Contributors 2

  •  
  •