diff --git a/AGENTS.md b/AGENTS.md index a97b4776..b0c6273c 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -23,6 +23,7 @@ - [ADR-0007: Type Metadata Format](docs/adr/ADR-0007-type-metadata-format.md) - [ADR-0008: Tree Navigation](docs/adr/ADR-0008-tree-navigation.md) - [ADR-0009: Type System](docs/adr/ADR-0009-type-system.md) + - [ADR-0010: Type System v2](docs/adr/ADR-0010-type-system-v2.md) - **Template**: ```markdown diff --git a/docs/adr/ADR-0005-transition-graph-format.md b/docs/adr/ADR-0005-transition-graph-format.md index eb8e19a8..08abc6ff 100644 --- a/docs/adr/ADR-0005-transition-graph-format.md +++ b/docs/adr/ADR-0005-transition-graph-format.md @@ -170,7 +170,8 @@ enum EffectOp { EndArray, StartObject, EndObject, - Field(DataFieldId), + SetField(DataFieldId), + PushField(DataFieldId), StartVariant(VariantTagId), EndVariant, ToString, @@ -259,11 +260,11 @@ Query: `{ (identifier) @name (number) @value } @pair` ``` T0: ε + StartObject → [T1] T1: Match(identifier) → [T2] -T2: ε + Field("name") → [T3] +T2: ε + SetField("name") → [T3] T3: Match(number) → [T4] -T4: ε + Field("value") → [T5] +T4: ε + SetField("value") → [T5] T5: ε + EndObject → [T6] -T6: ε + Field("pair") → [...] +T6: ε + SetField("pair") → [...] ``` ### Example: Tagged Alternation @@ -274,10 +275,10 @@ Query: `[ A: (true) @val B: (false) @val ]` T0: ε (branch) → [T1, T4] T1: ε + StartVariant("A") → [T2] T2: Match(true) → [T3] -T3: ε + Field("val") + EndVariant → [T7] +T3: ε + SetField("val") + EndVariant → [T7] T4: ε + StartVariant("B") → [T5] T5: Match(false) → [T6] -T6: ε + Field("val") + EndVariant → [T7] +T6: ε + SetField("val") + EndVariant → [T7] ``` ### Epsilon Elimination @@ -303,10 +304,10 @@ Example of safe elimination: Before: T1: Match(A) [CaptureNode] → [T2] T2: ε [PushElement] → [T3] -T3: Match(B) [CaptureNode, Field("b")] → [...] +T3: Match(B) [CaptureNode, SetField("b")] → [...] After: -T3': Match(B) [PushElement, CaptureNode, Field("b")] → [...] +T3': Match(B) [PushElement, CaptureNode, SetField("b")] → [...] ``` `PushElement` consumes T1's captured value before T3 overwrites `current`. diff --git a/docs/adr/ADR-0006-dynamic-query-execution.md b/docs/adr/ADR-0006-dynamic-query-execution.md index 58a6488d..a755d4a4 100644 --- a/docs/adr/ADR-0006-dynamic-query-execution.md +++ b/docs/adr/ADR-0006-dynamic-query-execution.md @@ -80,13 +80,14 @@ enum Container<'a> { | `PushElement` | move `current` into top array | | `EndArray` | pop array into `current` | | `StartObject` | push `Object({})` onto stack | -| `Field(id)` | move `current` into top object field | +| `SetField(id)` | set field `id` to `current` | +| `PushField(id)` | append `current` to array at field `id` | | `EndObject` | pop object into `current` | | `StartVariant(tag)` | push `Variant(tag)` onto stack | | `EndVariant` | pop, wrap `current`, set as current | | `ToString` | replace `current` Node with text | -`ClearCurrent` is emitted on skip paths for optional captures (`expr? @name`). When the optional is skipped, `ClearCurrent` ensures `current = None` before `Field(id)` executes, producing the correct `None` value for the optional field. +`ClearCurrent` is emitted on skip paths for optional captures (`expr? @name`). When the optional is skipped, `ClearCurrent` ensures `current = None` before `SetField(id)` executes, producing the correct `None` value for the optional field. Invalid state = IR bug → panic. diff --git a/docs/adr/ADR-0009-type-system.md b/docs/adr/ADR-0009-type-system.md index 57533763..e6c85b61 100644 --- a/docs/adr/ADR-0009-type-system.md +++ b/docs/adr/ADR-0009-type-system.md @@ -1,6 +1,6 @@ # ADR-0009: Type System -- **Status**: Accepted +- **Status**: Superseded by [ADR-0010](ADR-0010-type-system-v2.md) - **Date**: 2025-01-14 ## Context diff --git a/docs/adr/ADR-0010-type-system-v2.md b/docs/adr/ADR-0010-type-system-v2.md new file mode 100644 index 00000000..3990d65e --- /dev/null +++ b/docs/adr/ADR-0010-type-system-v2.md @@ -0,0 +1,119 @@ +# ADR-0010: Type System v2 (Transparent Graph Model) + +- **Status**: Accepted +- **Date**: 2025-01-14 +- **Supersedes**: ADR-0009 + +## Context + +The previous type system (ADR-0009) relied on implicit behaviors like "Quantifier-Induced Scope" (QIS) and "Single-Capture Unwrap" to reduce verbosity. While well-intentioned, these rules created "Wrapper Hell," where extracting logic into a reusable definition inadvertently changed the output structure. + +We need a model that supports **Mixin-like composition** (logic reuse without structural nesting) while maintaining strict type safety and data integrity. + +## Decision + +We adopt the **Transparent Graph Model**. + +### 1. Universal Bubbling ("Let It Bubble") + +Captures (`@name`) always bubble up to the nearest **Explicit Scope Boundary**. + +- **Private Definitions (`Def =`) are Transparent.** They act as macros or fragments. +- **Uncaptured Containers (`{...}`, `[...]`) are Transparent.** +- **References (`(Def)`) are Transparent.** + +This enables compositional patterns where a definition contributes fields to its caller's struct. + +### 2. Explicit Scope Boundaries + +A new data structure (Scope) is created **only** by explicit intent. + +1. **Public Roots:** `pub Def = ...` (The API Contract). +2. **Explicit Wrappers:** + - `{...} @name` (Nested Group). + - `[...] @name` (Nested Union). + - `[ L: ... ] @name` (Tagged Union). + +**Payload Rule**: + +- **0 Captures**: `Void` (Logic-only matcher). +- **1..N Captures**: `Struct { field_1, ..., field_N }`. +- **No Implicit Unwrap**: A single capture `(node) @x` produces `{ x: Node }`. It is never unwrapped to `Node`. + - _Benefit:_ Adding a second capture is non-breaking (`res.x` remains valid). + +### 3. Parallel Arrays (Columnar Output) + +Quantifiers (`*`, `+`) do **not** create implicit "Row Structs." instead, they change the cardinality of the bubbled fields to `Array`. + +**Example**: `( (A) @a (B) @b )*` +**Output**: `{ a: Array, b: Array }` (Struct of Arrays). + +This optimizes for the common case of data extraction (where SoA is often preferred) and avoids the complexity of implicit row creation. + +### 4. Row Integrity (Safety Check) + +To prevent **Data Desynchronization** (where `a[i]` no longer corresponds to `b[i]`), the Inference Pass enforces **Row Integrity**. + +**Rule**: A quantified scope cannot mix **Synchronized** and **Desynchronized** fields. + +- **Synchronized**: Field is strictly required (`1`) in the loop body. +- **Desynchronized**: Field is optional (`?`), repeated (`*`, `+`), or in an alternation. + +| Pattern | Fields | Status | Result | +| :--------------------- | :------------- | :----------- | :-------------- | +| `(A) @a (B) @b` | `a: 1`, `b: 1` | **Aligned** | ✅ OK (Columns) | +| `[ (A) @a \| (B) @b ]` | `a: ?`, `b: ?` | **Disjoint** | ✅ OK (Buckets) | +| `(A) @a (B)? @b` | `a: 1`, `b: ?` | **Mixed** | ❌ **Error** | + +**Error Message**: _"Field `@b` is optional while `@a` is required. Parallel arrays will not align. Wrap in `{...} @row` to enforce structure."_ + +### 5. Definition Roles + +| Feature | `Def` (Private) | `pub Def` (Public) | +| :----------------- | :---------------------------- | :---------------------- | +| **Concept** | **Fragment / Mixin** | **API Contract / Root** | +| **Graph Behavior** | Inlined (Copy-Paste) | Entrypoint | +| **Scoping** | Transparent (Captures bubble) | **Scope Boundary** | +| **Output Type** | Merges into parent | Named Interface | + +## Mental Model Migration + +| Old Way (Opaque) | New Way (Transparent) | +| :---------------- | :------------------------------------------- | -------------------------------------------------- | +| **Extract Def** | Broken `res.x`. Must rewrite as `res.def.x`. | Safe. `res.x` remains `res.x`. | +| **List of Items** | Implicit `RowStruct`. Hard to desync. | Explicit `Array, Array`. Enforced integrity. | +| **Collision** | Silent (Data Loss). | Compiler Error ("Duplicate Capture"). | +| **Fix Collision** | Manual re-capture. | Wrap: `{ (Def) } @alias`. | + +## Edge Cases + +### Recursive Definitions + +Since private definitions inline their contents, infinite recursion is structurally impossible for inlining. + +**Solution**: + +- Recursive definitions must be `pub` (creating a stable API boundary) OR wrapped in a capture at the call site `(Recurse) @next`. +- _Note: This is a natural constraint. Recursion implies a tree structure, so the output type must naturally reflect that tree structure._ + +### Collision Handling + +`A(B) = (node (B) (B))` + +- **Issue**: `B` captures `@id`. Using it twice causes "Duplicate Capture". +- **Solution**: User must disambiguate: `(node (B) @left (B) @right)`. +- **Benefit**: The output shape `{ left: {id}, right: {id} }` matches the semantic intent. + +## Consequences + +**Positive**: + +- **Refactoring Safety**: Extracting logic into a `Def` never changes the output shape. +- **Performance**: Parallel arrays (SoA) are cache-friendly and often what is needed for analysis. +- **Robustness**: The Row Integrity check prevents silent data corruption. +- **Simplicity**: No magic rules (QIS, Implicit Unwrap). + +**Negative**: + +- **Verbosity**: Must explicitly wrap `{...} @row` for list-of-structs. +- **Strictness**: "Mixed" optionality in loops is now a hard error, requiring explicit handling.