Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
- [ADR-0005: Transition Graph Format](docs/adr/ADR-0005-transition-graph-format.md)
- [ADR-0006: Dynamic Query Execution](docs/adr/ADR-0006-dynamic-query-execution.md)
- [ADR-0007: Type Metadata Format](docs/adr/ADR-0007-type-metadata-format.md)
- [ADR-0008: Tree Navigation](docs/adr/ADR-0008-tree-navigation.md)
- **Template**:

```markdown
Expand Down
22 changes: 19 additions & 3 deletions docs/adr/ADR-0004-query-ir-binary-format.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# ADR-0004: Query IR Binary Format

- **Status**: Accepted
- **Date**: 2025-12-12
- **Date**: 2024-12-12
- **Supersedes**: Parts of ADR-0003

## Context
Expand All @@ -23,6 +23,7 @@ struct QueryIR {
type_defs_offset: u32,
type_members_offset: u32,
entrypoints_offset: u32,
ignored_kinds_offset: u32, // 0 = no ignored kinds
}
```

Expand All @@ -36,12 +37,22 @@ const BUFFER_ALIGN: usize = 64; // cache-line alignment for transitions
struct QueryIRBuffer {
ptr: *mut u8,
len: usize,
owned: bool, // true if allocated, false if mmap'd
}
```

Allocated via `Layout::from_size_align(len, BUFFER_ALIGN)`. Standard `Box<[u8]>` won't work—it assumes 1-byte alignment and corrupts `dealloc`. The 64-byte alignment ensures transitions never straddle cache lines.

**Deallocation**: `QueryIRBuffer` must implement `Drop` to reconstruct the exact `Layout` (size + 64-byte alignment) and call `std::alloc::dealloc`. Using `Box::from_raw` or similar would assume align=1 and cause undefined behavior.
**Ownership semantics**:

| `owned` | Source | `Drop` action |
| ------- | ------------------- | ------------------------------------------------ |
| `true` | `std::alloc::alloc` | Reconstruct `Layout`, call `std::alloc::dealloc` |
| `false` | `mmap` / external | No-op (caller manages lifetime) |

For mmap'd queries, the OS maps file pages directly into address space. The 64-byte header ensures buffer data starts aligned. `QueryIRBuffer` with `owned: false` provides a view without taking ownership—the backing file mapping must outlive the `QueryIR`.

**Deallocation**: When `owned: true`, `Drop` must reconstruct the exact `Layout` (size + 64-byte alignment) and call `std::alloc::dealloc`. Using `Box::from_raw` or similar would assume align=1 and cause undefined behavior.

### Segments

Expand All @@ -56,6 +67,7 @@ Allocated via `Layout::from_size_align(len, BUFFER_ALIGN)`. Standard `Box<[u8]>`
| Type Defs | `[TypeDef; T]` | `type_defs_offset` | 4 |
| Type Members | `[TypeMember; U]` | `type_members_offset` | 2 |
| Entrypoints | `[Entrypoint; V]` | `entrypoints_offset` | 4 |
| Ignored Kinds | `[NodeTypeId; W]` | `ignored_kinds_offset` | 2 |

Each offset is aligned: `(offset + align - 1) & !(align - 1)`.

Expand Down Expand Up @@ -103,7 +115,7 @@ struct Entrypoint {
### Serialization

```
Header (48 bytes):
Header (64 bytes):
magic: [u8; 4] b"PLNK"
version: u32 format version + ABI hash
checksum: u32 CRC32(offsets || buffer_data)
Expand All @@ -116,10 +128,14 @@ Header (48 bytes):
type_defs_offset: u32
type_members_offset: u32
entrypoints_offset: u32
ignored_kinds_offset: u32
_pad: [u8; 12] reserved, zero-filled

Buffer Data (buffer_len bytes)
```

Header is 64 bytes to ensure buffer data starts at a 64-byte aligned offset. This enables true zero-copy `mmap` usage where transitions at offset 0 within the buffer are correctly aligned.

Little-endian always. UTF-8 strings. Version mismatch or checksum failure → recompile.

### Construction
Expand Down
109 changes: 54 additions & 55 deletions docs/adr/ADR-0005-transition-graph-format.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# ADR-0005: Transition Graph Format

- **Status**: Accepted
- **Date**: 2025-12-12
- **Date**: 2024-12-12
- **Supersedes**: Parts of ADR-0003

## Context
Expand Down Expand Up @@ -38,22 +38,22 @@ struct Slice<T> {
```rust
#[repr(C, align(64))]
struct Transition {
// --- 40 bytes metadata ---
// --- 32 bytes metadata ---
matcher: Matcher, // 16
pre_anchored: bool, // 1
post_anchored: bool, // 1
pre_nav: PreNav, // 2 (see ADR-0008)
_pad1: [u8; 2], // 2
pre_effects: Slice<EffectOp>, // 8
post_effects: Slice<EffectOp>, // 8
effects: Slice<EffectOp>, // 8
ref_marker: RefTransition, // 4

// --- 24 bytes control flow ---
// --- 32 bytes control flow ---
successor_count: u32, // 4
successor_data: [u32; 5], // 20
successor_data: [u32; 7], // 28
}
// 64 bytes, align 64 (cache-line aligned)
```

Navigation is fully determined by `pre_nav`—no runtime dispatch based on previous matcher. See [ADR-0008](ADR-0008-tree-navigation.md) for `PreNav` definition and semantics.

Single `ref_marker` slot—sequences like `Enter(A) → Enter(B)` remain as epsilon chains.

### Inline Successors (SSO-style)
Expand All @@ -62,18 +62,18 @@ Successors use a small-size optimization to avoid indirection for the common cas

| `successor_count` | Layout |
| ----------------- | ----------------------------------------------------------------------------------- |
| 0–5 | `successor_data[0..count]` contains `TransitionId` values directly |
| > 5 | `successor_data[0]` is index into `successors` segment, `successor_count` is length |
| 0–7 | `successor_data[0..count]` contains `TransitionId` values directly |
| > 7 | `successor_data[0]` is index into `successors` segment, `successor_count` is length |

Why 5 slots: 24 available bytes / 4 bytes per `TransitionId` = 6 slots, minus 1 for the count field leaves 5.
Why 7 slots: 32 available bytes / 4 bytes per `TransitionId` = 8 slots, minus 1 for the count field leaves 7.

Coverage:

- Linear sequences: 1 successor
- Simple branches, quantifiers: 2 successors
- Most alternations: 2–5 branches
- Most alternations: 2–7 branches

Only massive alternations (6+ branches) spill to the external buffer.
Only massive alternations (8+ branches) spill to the external buffer.

Cache benefits:

Expand All @@ -98,14 +98,14 @@ enum Matcher {
negated_fields: Slice<NodeFieldId>, // 8
},
Wildcard,
Down, // cursor to first child
Up, // cursor to parent
}
// 16 bytes, align 4
```

`Option<NodeFieldId>` uses 0 for `None` (niche optimization).

Navigation (descend/ascend) is handled by `PreNav`, not matchers. Matchers are purely for node matching.

### RefTransition

```rust
Expand All @@ -118,6 +118,8 @@ enum RefTransition {
// 4 bytes, align 2
```

Layout: 1-byte discriminant + 1-byte padding + 2-byte `RefId` payload = 4 bytes. Alignment is 2 (from `RefId: u16`). Fits comfortably in the 64-byte `Transition` struct with room to spare.

Explicit `None` ensures stable binary layout (`Option<Enum>` niche is unspecified).

### Enter/Exit Semantics
Expand All @@ -126,10 +128,12 @@ Explicit `None` ensures stable binary layout (`Option<Enum>` niche is unspecifie

**Solution**: Store return transitions at `Enter` time (in the call frame), retrieve at `Exit` time. O(1) exit, no filtering.

For `Enter(ref_id)` transitions, `successor_data` has special structure:
For `Enter(ref_id)` transitions, the **logical** successor list (accessed via `TransitionView::successors()`) has special structure:

- `successor_data[0]`: definition entry point (where to jump)
- `successor_data[1..count]`: return transitions (stored in call frame)
- `successors()[0]`: definition entry point (where to jump)
- `successors()[1..]`: return transitions (stored in call frame)

This structure applies to the view, not raw `successor_data` memory. The SSO optimization (inline vs spilled storage) is orthogonal—the view abstracts it away. An `Enter` with 8+ returns spills to the external segment like any other transition; the interpreter accesses the logical list uniformly.

For `Exit(ref_id)` transitions, successors are **ignored**. Return transitions come from the call frame pushed at `Enter`. See [ADR-0006](ADR-0006-dynamic-query-execution.md) for execution details.

Expand All @@ -149,6 +153,7 @@ T11: ε + Exit(Func) successors=[] (ignored, returns from frame)
```rust
#[repr(C, u16)]
enum EffectOp {
CaptureNode, // store matched node as current value
StartArray,
PushElement,
EndArray,
Expand All @@ -162,19 +167,9 @@ enum EffectOp {
// 4 bytes, align 2
```

No `CaptureNode`—implicit on successful match.

### Effect Placement
`CaptureNode` is explicit—graph construction places it at the correct position relative to container effects.

| Effect | Placement | Why |
| -------------- | --------- | -------------------------- |
| `StartArray` | Pre | Container before elements |
| `StartObject` | Pre | Container before fields |
| `StartVariant` | Pre | Tag before payload |
| `PushElement` | Post | Consumes matched node |
| `Field` | Post | Consumes matched node |
| `End*` | Post | Finalizes after last match |
| `ToString` | Post | Converts matched node |
**Invariant**: The interpreter clears `matched_node` slot on `Enter` and backtrack restore. This prevents stale captures if a graph construction bug produces `Epsilon → CaptureNode` without a preceding `Match`. With proper graphs, `CaptureNode` always follows a successful match that populates the slot.

### View Types

Expand All @@ -189,13 +184,15 @@ struct MatcherView<'a> {
raw: &'a Matcher,
}

enum MatcherKind { Epsilon, Node, Anonymous, Wildcard, Down, Up }
enum MatcherKind { Epsilon, Node, Anonymous, Wildcard }
```

Views resolve `Slice<T>` to `&[T]`. `TransitionView::successors()` returns `&[TransitionId]`, hiding the inline/spilled distinction—callers see a uniform slice regardless of storage location. Engine code never touches offsets or `successor_data` directly.

### Quantifiers

Examples in this section show graph structure and effects. Navigation (`pre_nav`) is omitted for brevity—see [ADR-0008](ADR-0008-tree-navigation.md) for full transition examples with navigation.

**Greedy `*`**:

```
Expand Down Expand Up @@ -229,22 +226,22 @@ Query: `(parameters (identifier)* @params)`
Before elimination:

```
T0: ε + StartArray → [T1]
T1: ε (branch) → [T2, T4]
T2: Match(identifier) → [T3]
T3: ε + PushElement → [T1]
T4: ε + EndArray → [T5]
T5: ε + Field("params") → [...]
T0: ε [StartArray] → [T1]
T1: ε (branch) → [T2, T4]
T2: Match(identifier) → [T3]
T3: ε [CaptureNode, PushElement] → [T1]
T4: ε [EndArray] → [T5]
T5: ε [Field("params")] → [...]
```

After:

```
T2': pre:[StartArray] Match(identifier) post:[PushElement] → [T2', T4']
T4': post:[EndArray, Field("params")] → [...]
T2': Match(identifier) [StartArray, CaptureNode, PushElement] → [T2', T4']
T4': ε [EndArray, Field("params")] → [...]
```

First iteration gets `StartArray` from T0's path. Loop iterations skip it.
First iteration gets `StartArray` from T0's path. Loop iterations skip it. Note T4' remains epsilon—effects cannot merge into T2' without breaking semantics.

### Example: Object

Expand Down Expand Up @@ -276,32 +273,34 @@ T6: ε + Field("val") + EndVariant → [T7]

### Epsilon Elimination

Partial—full elimination impossible due to single `ref_marker`.
Partial—full elimination impossible due to single `ref_marker` and effect ordering constraints.

**Execution order** (all transitions, including epsilon):

1. Emit `pre_effects`
2. Execute matcher (epsilon always succeeds)
3. On success: emit implicit `CaptureNode`, emit `post_effects`
1. Execute `pre_nav` and matcher
2. On success: emit `effects` in order

With explicit `CaptureNode`, effect order is unambiguous. When eliminating epsilon chains, concatenate effect lists in traversal order.

**When epsilon nodes must remain**:

An epsilon transition with `pre: [StartObject]` and `post: [EndObject]` legitimately creates an empty object. To avoid accidental empty structures in graph rewrites, move effects to the destination's `pre` or source's `post` as appropriate.
1. **Ref markers**: A transition can hold at most one `Enter`/`Exit`. Sequences like `Enter(A) → Enter(B)` need epsilon.
2. **Branch points**: An epsilon with multiple successors cannot merge into predecessors without duplicating effects.
3. **Effect ordering conflicts**: When incoming and outgoing effects cannot be safely reordered.

Why pre/post split matters:
Example of safe elimination:

```
Before:
T1: Match(A) → [T2] // current = A
T2: ε + PushElement → [T3] // push A ✓
T3: Match(B) → [...] // current = B
T1: Match(A) [CaptureNode] → [T2]
T2: ε [PushElement] → [T3]
T3: Match(B) [CaptureNode, Field("b")] → [...]

After (correct):
T3': pre:[PushElement] Match(B) // push A, then match B ✓

Wrong (no split):
T3': Match(B) post:[PushElement] // match B, push B ✗
After:
T3': Match(B) [PushElement, CaptureNode, Field("b")] → [...]
```

Incoming epsilon effects → `pre_effects`. Outgoing → `post_effects`.
`PushElement` consumes T1's captured value before T3 overwrites `current`.

## Consequences

Expand Down
Loading