diff --git a/README.md b/README.md index da6a4173..88443a46 100644 --- a/README.md +++ b/README.md @@ -126,7 +126,7 @@ Plotnik extends Tree-sitter's query syntax with: - **Named expressions** for composition and reuse - **Recursion** for arbitrarily nested structures - **Type annotations** for precise output shapes -- **Tagged alternations** for discriminated unions +- **Alternations**: untagged for simplicity, tagged for precision (discriminated unions) ## Use cases @@ -161,12 +161,12 @@ This produces: ```typescript type Statement = - | { tag: "Assign"; target: string; value: Expression } - | { tag: "Call"; func: string; args: Expression[] }; + | { $tag: "Assign"; $data: { target: string; value: Expression } } + | { $tag: "Call"; $data: { func: string; args: Expression[] } }; type Expression = - | { tag: "Ident"; name: string } - | { tag: "Num"; value: string }; + | { $tag: "Ident"; $data: { name: string } } + | { $tag: "Num"; $data: { value: string } }; type TopDefinitions = { statements: [Statement, ...Statement[]]; @@ -177,12 +177,12 @@ Then process the results: ```typescript for (const stmt of result.statements) { - switch (stmt.tag) { + switch (stmt.$tag) { case "Assign": - console.log(`Assignment to ${stmt.target}`); + console.log(`Assignment to ${stmt.$data.target}`); break; case "Call": - console.log(`Call to ${stmt.func} with ${stmt.args.length} args`); + console.log(`Call to ${stmt.$data.func} with ${stmt.$data.args.length} args`); break; } } diff --git a/docs/REFERENCE.md b/docs/REFERENCE.md index da3c05a4..47ad6719 100644 --- a/docs/REFERENCE.md +++ b/docs/REFERENCE.md @@ -492,6 +492,9 @@ interface Section { Match one of several alternatives with `[...]`: +- **Untagged** (no labels): Simpler output, fields merge. Use when you only need the captured data. +- **Tagged** (with labels): Precise discriminated union. Use when you need to know which branch matched. + ``` [ (identifier) @@ -589,10 +592,12 @@ Labels create a discriminated union: ] @stmt :: Stmt ``` -Output type (discriminant is always `tag`): +Output type (discriminant is always `$tag`, payload in `$data`): ```typescript -type Stmt = { tag: "Assign"; left: Node } | { tag: "Call"; func: Node }; +type Stmt = + | { $tag: "Assign"; $data: { left: Node } } + | { $tag: "Call"; $data: { func: Node } }; ``` In Rust, tagged alternations become enums: @@ -754,8 +759,8 @@ Output type: ```typescript type MemberChain = - | { tag: "Base"; name: Node } - | { tag: "Access"; object: MemberChain; property: Node }; + | { $tag: "Base"; $data: { name: Node } } + | { $tag: "Access"; $data: { object: MemberChain; property: Node } }; ``` --- @@ -787,14 +792,14 @@ Output types: ```typescript type Statement = - | { tag: "Assign"; target: string; value: Expression } - | { tag: "Call"; func: string; args: Expression[] } - | { tag: "Return"; value?: Expression }; + | { $tag: "Assign"; $data: { target: string; value: Expression } } + | { $tag: "Call"; $data: { func: string; args: Expression[] } } + | { $tag: "Return"; $data: { value?: Expression } }; type Expression = - | { tag: "Ident"; name: string } - | { tag: "Num"; value: string } - | { tag: "Str"; value: string }; + | { $tag: "Ident"; $data: { name: string } } + | { $tag: "Num"; $data: { value: string } } + | { $tag: "Str"; $data: { value: string } }; type Root = { statements: [Statement, ...Statement[]]; diff --git a/docs/adr/ADR-0003-query-intermediate-representation.md b/docs/adr/ADR-0003-query-intermediate-representation.md index b21abda5..230c98c4 100644 --- a/docs/adr/ADR-0003-query-intermediate-representation.md +++ b/docs/adr/ADR-0003-query-intermediate-representation.md @@ -351,7 +351,19 @@ EndObject EndVariant ``` -The resulting `Value::Variant` preserves the tag distinct from the payload, preventing name collisions. When serialized to JSON, it flattens to match the documented data model: `{ tag: "A", ...payload }`. +The resulting `Value::Variant` preserves the tag distinct from the payload, preventing name collisions. + +**JSON serialization** always uses `$data` wrapper for uniformity: + +```json +{ "$tag": "A", "$data": { "x": 1, "y": 2 } } +{ "$tag": "B", "$data": [1, 2, 3] } +{ "$tag": "C", "$data": "foo" } +``` + +The `$tag` and `$data` keys avoid collisions with user-defined captures. Uniform structure simplifies parsing (always access `.$data`) and eliminates conditional flatten-vs-wrap logic. + +This mirrors Rust's serde adjacently-tagged representation and remains fully readable for LLMs. No query validation restriction—all payload types are valid. **Constraint: branches must produce objects.** Top-level quantifiers in tagged branches are disallowed: @@ -430,7 +442,7 @@ struct Interpreter<'a> { After initial construction, epsilon transitions can be eliminated by computing epsilon closures. The `pre_effects`/`post_effects` split is essential for correctness here. -**Why the split matters**: A match transition overwrites `current` with the matched node. Effects from *preceding* epsilon transitions (like `PushElement`) need the *previous* `current` value. Without the split, merging them into a single post-match list would use the wrong value. +**Why the split matters**: A match transition overwrites `current` with the matched node. Effects from _preceding_ epsilon transitions (like `PushElement`) need the _previous_ `current` value. Without the split, merging them into a single post-match list would use the wrong value. ``` Before (raw graph): @@ -446,6 +458,7 @@ T3': Match(B) + [PushElement] // PushElement runs after Match( ``` **Accumulation rules**: + - Effects from incoming epsilon paths → accumulate into `pre_effects` - Effects from outgoing epsilon paths → accumulate into `post_effects`