Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,57 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.1.6] - 2025-12-26

### Added

- **AST Support in Generated Parsers**
- Generated CST parsers now include `AstNode` type and `parseAst()` method
- Allows parsing directly to AST (without trivia) from generated parsers

- **Packrat Toggle in Generated Parsers**
- Added `setPackratEnabled(boolean)` method to generated parsers
- Allows disabling memoization at runtime to reduce memory usage for large inputs

- **Unlimited Action Variable Support**
- Action code now supports unlimited `$N` positional variables (previously limited to `$1-$20`)
- Uses regex-based substitution for flexibility

### Fixed

- **Grammar Validation**
- Implemented `Grammar.validate()` to detect undefined rule references
- Recursively walks all expressions and reports first undefined reference with location
- Previously, grammars with typos in rule names would fail at parse time with cryptic errors

- **Thread Safety in Whitespace Skipping**
- Moved `skippingWhitespace` flag from `PegEngine` (per-instance) to `ParsingContext` (per-parse)
- Fixes potential race conditions when reusing parser instances across threads

- **Packrat Cache Key Collision Risk**
- Changed cache key from `hashCode()` to unique sequential IDs
- Eliminates theoretical collision bugs with different rule names having same hash

### Changed

- **Builder API Naming Standardized**
- `PegParser.Builder` methods renamed for consistency: `withPackrat()` → `packrat()`, `withTrivia()` → `trivia()`, `withErrorRecovery()` → `recovery()`
- Removed duplicate `ParserConfig.Builder` (unused)

- **Documentation Cleanup**
- Removed undocumented `%word` directive from documentation (feature not implemented)
- Removed unused placeholder `skipWhitespace()` method from `ParsingContext`

- **Code Simplification**
- Consolidated 3 duplicate expression parsing switch statements into unified `parseExpressionWithMode()`
- Extracted `buildParseError()` helper to eliminate duplicate error message construction
- Removed unused `SemanticValues.choice` field and getter
- Removed unused `SourceLocation.advanceColumn()`/`advanceLine()` methods
- ~120 lines of duplicate code eliminated

- Test count: 268 → 271
- Updated pragmatica-lite dependency: 0.8.4 → 0.9.0

## [0.1.5] - 2025-12-22

### Fixed
Expand Down
39 changes: 21 additions & 18 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Java implementation of PEG (Parsing Expression Grammar) parser inspired by [cpp-
| Tree output | Both CST and AST | CST for formatting/linting, AST for compilers |
| Whitespace/comments | Grouped as Trivia nodes | Convenient for tooling |
| Error recovery | Configurable (basic/advanced) | Flexibility for different use cases |
| Runtime dependency | `pragmatica-lite:core` 0.8.4 | Result/Option/Promise types |
| Runtime dependency | `pragmatica-lite:core` 0.9.0 | Result/Option/Promise types |

## Compilation Modes

Expand All @@ -39,10 +39,11 @@ src/main/java/org/pragmatica/peg/
│ └── GrammarParser.java # Recursive descent parser
├── parser/
│ ├── Parser.java # Parser interface
│ ├── ParserConfig.java # Configuration record + builder
│ ├── ParserConfig.java # Configuration record
│ ├── ParsingContext.java # Mutable parsing state with packrat cache
│ ├── ParseResult.java # Parse result types (sealed)
│ ├── ParseResultWithDiagnostics.java # Result with error recovery diagnostics
│ ├── ParseMode.java # Parsing mode (standard, withActions, noWhitespace)
│ └── PegEngine.java # PEG parsing engine with action execution
├── tree/
│ ├── SourceLocation.java # Position in source (line, column, offset)
Expand All @@ -69,17 +70,18 @@ src/test/java/org/pragmatica/peg/
├── GeneratedParserTriviaTest.java # 6 tests (generated parser trivia)
├── ErrorRecoveryTest.java # 8 tests (error recovery + diagnostics)
├── grammar/
│ └── GrammarParserTest.java # 14 tests for grammar parser
│ └── GrammarParserTest.java # 17 tests for grammar parser
├── generator/
│ └── ParserGeneratorTest.java # 16 tests for source generation (8 basic + 8 ErrorReporting)
│ └── ParserGeneratorTest.java # 18 tests for source generation
└── examples/
├── ErrorRecoveryExample.java # 12 tests - error recovery patterns
├── CalculatorExample.java # 6 tests - arithmetic with actions
├── JsonParserExample.java # 11 tests - JSON CST parsing
├── SExpressionExample.java # 11 tests - Lisp-like syntax
├── CsvParserExample.java # 8 tests - CSV data format
├── SourceGenerationExample.java # 9 tests - standalone parser
└── Java25GrammarExample.java # 59 tests - Java 25 syntax
├── SourceGenerationExample.java # 11 tests - standalone parser
├── CutOperatorRegressionTest.java # 16 tests - cut operator regression tests
└── Java25GrammarExample.java # 60 tests - Java 25 syntax
```

## Grammar Syntax (cpp-peglib compatible)
Expand Down Expand Up @@ -118,7 +120,6 @@ $name # Back-reference

# Directives
%whitespace <- [ \t\r\n]* # Auto-skip whitespace
%word <- [a-zA-Z]+ # Word boundary detection

# Inline actions (Java)
Number <- < [0-9]+ > { return sv.toInt(); }
Expand All @@ -129,7 +130,7 @@ Sum <- Number '+' Number { return (Integer)$1 + (Integer)$2; }

### Completed
- [x] Project scaffolded with `jbct init`
- [x] pom.xml updated for Java 25, pragmatica-lite 0.8.4
- [x] pom.xml updated for Java 25, pragmatica-lite 0.9.0
- [x] Core types implemented
- [x] Grammar parser (bootstrap) implemented
- [x] Parsing engine with packrat memoization
Expand All @@ -139,7 +140,7 @@ Sum <- Number '+' Number { return (Integer)$1 + (Integer)$2; }
- [x] Advanced error recovery with Rust-style diagnostics
- [x] Generated parser ErrorReporting (BASIC/ADVANCED) for optional Rust-style diagnostics
- [x] Cut operator (^/↑) - commits to current choice, prevents backtracking
- [x] 268 passing tests
- [x] 271 passing tests

### Remaining Work
- [ ] Performance optimization
Expand Down Expand Up @@ -172,8 +173,8 @@ Result<Object> result = calculator.parse("3 + 5"); // Returns 8

// Configuration
var parser = PegParser.builder(grammar)
.withPackrat(true)
.withTrivia(true)
.packrat(true)
.trivia(true)
.build()
.unwrap();

Expand Down Expand Up @@ -248,7 +249,7 @@ Advanced error recovery with Rust-style diagnostic messages.
### API Usage
```java
var parser = PegParser.builder(grammar)
.withErrorRecovery(RecoveryStrategy.ADVANCED)
.recovery(RecoveryStrategy.ADVANCED)
.build()
.unwrap();

Expand Down Expand Up @@ -280,13 +281,14 @@ error: unexpected input
### Recovery Points
Parser recovers at: `,`, `;`, `}`, `)`, `]`, newline

## Test Coverage (268 tests)
## Test Coverage (271 tests)

### Grammar Parser Tests (14 tests)
### Grammar Parser Tests (17 tests)
- Simple rules, actions, sequences, choices
- Lookahead predicates, repetition operators
- Token boundaries, whitespace directive
- Case-insensitive matching, named captures
- Grammar validation (undefined rule references)

### Parsing Engine Tests (29 tests)
- Literals, character classes, negated classes
Expand All @@ -304,7 +306,7 @@ Parser recovers at: `,`, `;`, `}`, `)`, `]`, newline
- List building
- No action returns CST node

### Generator Tests (16 tests)
### Generator Tests (18 tests)
- Simple literal generates valid Java
- Whitespace handling
- Action code inclusion
Expand All @@ -314,14 +316,15 @@ Parser recovers at: `,`, `;`, `}`, `)`, `]`, newline
- ErrorReporting.ADVANCED mode (Rust-style diagnostics)
- parseWithDiagnostics() method generation

### Example Tests (116 tests)
### Example Tests (135 tests)
- **ErrorRecovery** (12 tests): Recovery strategies, diagnostic formatting, CST error nodes
- **Calculator** (6 tests): Number parsing, addition, multiplication, boolean/double types
- **JSON** (11 tests): CST parsing of JSON values, objects, arrays, nested structures
- **S-Expression** (11 tests): Lisp-like syntax, nested lists, atoms, symbols
- **CSV** (8 tests): Field parsing, empty fields, spaces preserved
- **Source Generation** (9 tests): Standalone parser generation, all operators
- **Java25Grammar** (59 tests): Full Java 25 syntax including modules, var, patterns, text blocks
- **Source Generation** (11 tests): Standalone parser generation, all operators
- **CutOperatorRegression** (16 tests): Cut operator regression tests
- **Java25Grammar** (60 tests): Full Java 25 syntax including modules, var, patterns, text blocks

### Trivia Tests (19 tests)
- **TriviaTest** (13 tests): Runtime trivia - leading, trailing, mixed, comments
Expand Down
15 changes: 6 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ A PEG (Parsing Expression Grammar) parser library for Java, inspired by [cpp-peg
<dependency>
<groupId>org.pragmatica-lite</groupId>
<artifactId>peglib</artifactId>
<version>0.1.5</version>
<version>0.1.6</version>
</dependency>
```

Expand Down Expand Up @@ -128,9 +128,6 @@ $name # Back-reference to captured 'name'
```peg
# Auto-skip whitespace between tokens
%whitespace <- [ \t\r\n]*

# Word boundary detection
%word <- [a-zA-Z]+
```

### Inline Actions
Expand Down Expand Up @@ -163,9 +160,9 @@ Note: `$1`, `$2`, etc. use 1-based indexing (like regex groups), while `sv.get()

```java
var parser = PegParser.builder(grammar)
.withPackrat(true) // Enable memoization (default: true)
.withTrivia(true) // Collect whitespace/comments (default: true)
.withErrorRecovery(RecoveryStrategy.ADVANCED) // Error recovery mode
.packrat(true) // Enable memoization (default: true)
.trivia(true) // Collect whitespace/comments (default: true)
.recovery(RecoveryStrategy.ADVANCED) // Error recovery mode
.build()
.unwrap();
```
Expand All @@ -176,7 +173,7 @@ Peglib provides advanced error recovery with Rust-style diagnostic messages:

```java
var parser = PegParser.builder(grammar)
.withErrorRecovery(RecoveryStrategy.ADVANCED)
.recovery(RecoveryStrategy.ADVANCED)
.build()
.unwrap();

Expand Down Expand Up @@ -330,7 +327,7 @@ public sealed interface CstNode {

```bash
mvn compile # Compile
mvn test # Run tests (268 tests)
mvn test # Run tests (271 tests)
mvn verify # Full verification
```

Expand Down
20 changes: 10 additions & 10 deletions docs/ERROR_RECOVERY.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Peglib supports three recovery strategies:

```java
var parser = PegParser.builder(grammar)
.withErrorRecovery(RecoveryStrategy.NONE)
.recovery(RecoveryStrategy.NONE)
.build()
.unwrap();
```
Expand All @@ -45,7 +45,7 @@ var parser = PegParser.builder(grammar)

```java
var parser = PegParser.builder(grammar)
.withErrorRecovery(RecoveryStrategy.BASIC)
.recovery(RecoveryStrategy.BASIC)
.build()
.unwrap();
```
Expand All @@ -58,7 +58,7 @@ var parser = PegParser.builder(grammar)

```java
var parser = PegParser.builder(grammar)
.withErrorRecovery(RecoveryStrategy.ADVANCED)
.recovery(RecoveryStrategy.ADVANCED)
.build()
.unwrap();
```
Expand All @@ -74,7 +74,7 @@ var parser = PegParser.builder(grammar)

```java
var parser = PegParser.builder(grammar)
.withErrorRecovery(RecoveryStrategy.ADVANCED)
.recovery(RecoveryStrategy.ADVANCED)
.build()
.unwrap();

Expand Down Expand Up @@ -406,13 +406,13 @@ List<QuickFix> suggestFixes(Diagnostic d) {

```java
// For validation (fast fail)
.withErrorRecovery(RecoveryStrategy.NONE)
.recovery(RecoveryStrategy.NONE)

// For CLI tools (single error)
.withErrorRecovery(RecoveryStrategy.BASIC)
.recovery(RecoveryStrategy.BASIC)

// For IDEs/editors (all errors)
.withErrorRecovery(RecoveryStrategy.ADVANCED)
.recovery(RecoveryStrategy.ADVANCED)
```

### 2. Provide Helpful Messages
Expand Down Expand Up @@ -483,7 +483,7 @@ var grammar = """
""";

var parser = PegParser.builder(grammar)
.withErrorRecovery(RecoveryStrategy.ADVANCED)
.recovery(RecoveryStrategy.ADVANCED)
.build()
.unwrap();

Expand Down Expand Up @@ -525,7 +525,7 @@ var grammar = """
""";

var parser = PegParser.builder(grammar)
.withErrorRecovery(RecoveryStrategy.ADVANCED)
.recovery(RecoveryStrategy.ADVANCED)
.build()
.unwrap();

Expand Down Expand Up @@ -553,7 +553,7 @@ var grammar = """
""";

var parser = PegParser.builder(grammar)
.withErrorRecovery(RecoveryStrategy.ADVANCED)
.recovery(RecoveryStrategy.ADVANCED)
.build()
.unwrap();

Expand Down
4 changes: 2 additions & 2 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

<groupId>org.pragmatica-lite</groupId>
<artifactId>peglib</artifactId>
<version>0.1.5</version>
<version>0.1.6</version>
<packaging>jar</packaging>

<name>Peglib</name>
Expand Down Expand Up @@ -38,7 +38,7 @@
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.release>25</maven.compiler.release>
<pragmatica-lite.version>0.8.4</pragmatica-lite.version>
<pragmatica-lite.version>0.9.0</pragmatica-lite.version>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the pom.xml file to understand the context
git ls-files "*pom.xml*" | head -5

Repository: siy/java-peglib

Length of output: 66


🏁 Script executed:

# Search for pragmatica-lite in pom.xml and surrounding lines
rg "pragmatica-lite" -A 5 -B 5

Repository: siy/java-peglib

Length of output: 19372


🏁 Script executed:

# Check Maven Central for pragmatica-lite 0.9.0 availability
curl -s "https://search.maven.org/solrsearch/select?q=g:org.pragmatica-lite+AND+a:core+AND+v:0.9.0&rows=1&wt=json" | jq '.'

Repository: siy/java-peglib

Length of output: 501


🏁 Script executed:

# Check for repository configurations in pom.xml
cat pom.xml | head -100

Repository: siy/java-peglib

Length of output: 3379


🏁 Script executed:

# Check if pragmatica-lite 0.8.4 exists (previous version)
curl -s "https://search.maven.org/solrsearch/select?q=g:org.pragmatica-lite+AND+a:core&rows=10&wt=json" | jq '.response.docs[] | {v, timestamp}'

Repository: siy/java-peglib

Length of output: 41


🏁 Script executed:

# Check if there are any repository configurations
rg "repositories|url" pom.xml -A 2

Repository: siy/java-peglib

Length of output: 408


pragmatica-lite:core 0.9.0 is not available in Maven Central.

The dependency declared on line 41 cannot be resolved. Before releasing 0.1.6, either publish pragmatica-lite:core 0.9.0 to Maven Central or adjust the version to an available release (e.g., 0.8.4).

🤖 Prompt for AI Agents
In pom.xml around line 41, the pragmatica-lite.version property is set to 0.9.0
which is not available on Maven Central; either change that property to a
published version (e.g., 0.8.4) in the pom.xml and run a full build (mvn -U
clean verify) to ensure dependency resolution, or publish pragmatica-lite:core
0.9.0 to Maven Central following our release process and then leave the pom
unchanged.

</properties>

<dependencies>
Expand Down
6 changes: 3 additions & 3 deletions src/main/java/org/pragmatica/peg/PegParser.java
Original file line number Diff line number Diff line change
Expand Up @@ -131,17 +131,17 @@ private Builder(String grammarText) {
this.grammarText = grammarText;
}

public Builder withPackrat(boolean enabled) {
public Builder packrat(boolean enabled) {
this.packratEnabled = enabled;
return this;
}

public Builder withErrorRecovery(RecoveryStrategy strategy) {
public Builder recovery(RecoveryStrategy strategy) {
this.recoveryStrategy = strategy;
return this;
}

public Builder withTrivia(boolean capture) {
public Builder trivia(boolean capture) {
this.captureTrivia = capture;
return this;
}
Expand Down
12 changes: 8 additions & 4 deletions src/main/java/org/pragmatica/peg/action/ActionCompiler.java
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
import java.util.List;
import java.util.Map;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.regex.Pattern;

/**
* Compiles inline Java actions from grammar rules.
Expand Down Expand Up @@ -91,14 +92,17 @@ public Result<Action> compileActionCode(String ruleName, String actionCode, Sour
return compileAndLoad(fullClassName, sourceCode, location);
}

private static final Pattern POSITIONAL_VAR = Pattern.compile("\\$(\\d+)");

private String transformActionCode(String code) {
// Replace $0 with sv.token()
var result = code.replace("$0", "sv.token()");

// Replace $1, $2, ... with sv.get(0), sv.get(1), ...
for (int i = 1; i <= 20; i++) {
result = result.replace("$" + i, "sv.get(" + (i - 1) + ")");
}
// Replace $N (N > 0) with sv.get(N-1) using regex for unlimited support
result = POSITIONAL_VAR.matcher(result).replaceAll(match -> {
int n = Integer.parseInt(match.group(1));
return n == 0 ? "sv.token()" : "sv.get(" + (n - 1) + ")";
});

return result;
}
Expand Down
Loading