Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 13 additions & 12 deletions docs/adr/ADR-0004-query-ir-binary-format.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
# ADR-0004: Query IR Binary Format
# ADR-0004: Compiled Query Binary Format

- **Status**: Accepted
- **Date**: 2024-12-12
- **Supersedes**: Parts of ADR-0003

## Context

The Query IR lives in a single contiguous allocation—cache-friendly, zero fragmentation, portable to WASM. This ADR defines the binary layout. Graph structures are in [ADR-0005](ADR-0005-transition-graph-format.md). Type metadata is in [ADR-0007](ADR-0007-type-metadata-format.md).
The compiled query lives in a single contiguous allocation—cache-friendly, zero fragmentation, portable to WASM. This ADR defines the binary layout. Graph structures are in [ADR-0005](ADR-0005-transition-graph-format.md). Type metadata is in [ADR-0007](ADR-0007-type-metadata-format.md).

## Decision

### Container

```rust
struct QueryIR {
ir_buffer: QueryIRBuffer,
struct CompiledQuery {
buffer: CompiledQueryBuffer,
successors_offset: u32,
effects_offset: u32,
negated_fields_offset: u32,
Expand All @@ -23,18 +23,18 @@ struct QueryIR {
type_defs_offset: u32,
type_members_offset: u32,
entrypoints_offset: u32,
ignored_kinds_offset: u32, // 0 = no ignored kinds
trivia_kinds_offset: u32, // 0 = no trivia kinds
}
```

Transitions start at offset 0. Default entrypoint is always at offset 0.
Transitions start at buffer offset 0. The default entrypoint is **Transition 0** (the root of the graph). The `entrypoints` table provides named exports for multi-definition queries; it does not affect the default entrypoint.

### QueryIRBuffer
### CompiledQueryBuffer

```rust
const BUFFER_ALIGN: usize = 64; // cache-line alignment for transitions

struct QueryIRBuffer {
struct CompiledQueryBuffer {
ptr: *mut u8,
len: usize,
owned: bool, // true if allocated, false if mmap'd
Expand All @@ -50,7 +50,7 @@ Allocated via `Layout::from_size_align(len, BUFFER_ALIGN)`. Standard `Box<[u8]>`
| `true` | `std::alloc::alloc` | Reconstruct `Layout`, call `std::alloc::dealloc` |
| `false` | `mmap` / external | No-op (caller manages lifetime) |

For mmap'd queries, the OS maps file pages directly into address space. The 64-byte header ensures buffer data starts aligned. `QueryIRBuffer` with `owned: false` provides a view without taking ownership—the backing file mapping must outlive the `QueryIR`.
For mmap'd queries, the OS maps file pages directly into address space. The 64-byte header ensures buffer data starts aligned. `CompiledQueryBuffer` with `owned: false` provides a view without taking ownership—the backing file mapping must outlive the `CompiledQuery`.

**Deallocation**: When `owned: true`, `Drop` must reconstruct the exact `Layout` (size + 64-byte alignment) and call `std::alloc::dealloc`. Using `Box::from_raw` or similar would assume align=1 and cause undefined behavior.

Expand All @@ -67,7 +67,7 @@ For mmap'd queries, the OS maps file pages directly into address space. The 64-b
| Type Defs | `[TypeDef; T]` | `type_defs_offset` | 4 |
| Type Members | `[TypeMember; U]` | `type_members_offset` | 2 |
| Entrypoints | `[Entrypoint; V]` | `entrypoints_offset` | 4 |
| Ignored Kinds | `[NodeTypeId; W]` | `ignored_kinds_offset` | 2 |
| Trivia Kinds | `[NodeTypeId; W]` | `trivia_kinds_offset` | 2 |

Each offset is aligned: `(offset + align - 1) & !(align - 1)`.

Expand All @@ -82,7 +82,7 @@ type StringId = u16;

#[repr(C)]
struct StringRef {
offset: u32, // into string_bytes
offset: u32, // byte offset into string_bytes (NOT element index)
len: u16,
_pad: u16,
}
Expand Down Expand Up @@ -128,7 +128,7 @@ Header (64 bytes):
type_defs_offset: u32
type_members_offset: u32
entrypoints_offset: u32
ignored_kinds_offset: u32
trivia_kinds_offset: u32
_pad: [u8; 12] reserved, zero-filled

Buffer Data (buffer_len bytes)
Expand Down Expand Up @@ -169,6 +169,7 @@ Buffer layout:
0x0300 Type Defs [Record{...}, Enum{...}, ...]
0x0340 Type Members [{name,Str}, {Ident,Ty5}, ...]
0x0380 Entrypoints [{name=Func, target=Tr0, type=Ty3}, ...]
0x03A0 Trivia Kinds [comment, ...]
```

`"name"` stored once, used by both `@name` captures.
Expand Down
50 changes: 25 additions & 25 deletions docs/adr/ADR-0005-transition-graph-format.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,32 +27,34 @@ Relative range within a segment:
```rust
#[repr(C)]
struct Slice<T> {
start: u32,
len: u32,
start_index: u32, // element index into segment array (NOT byte offset)
len: u16, // 65k elements per slice is sufficient
_phantom: PhantomData<T>,
}
// 6 bytes, align 4
```

`start_index` is an **element index**, not a byte offset. This naming distinguishes it from byte offsets like `StringRef.offset` and `CompiledQuery.*_offset`. The distinction matters for typed array access.

### Transition

```rust
#[repr(C, align(64))]
struct Transition {
// --- 32 bytes metadata ---
matcher: Matcher, // 16
pre_nav: PreNav, // 2 (see ADR-0008)
_pad1: [u8; 2], // 2
effects: Slice<EffectOp>, // 8
ref_marker: RefTransition, // 4
matcher: Matcher, // 16 (offset 0)
ref_marker: RefTransition, // 4 (offset 16)
successor_count: u32, // 4 (offset 20)
effects: Slice<EffectOp>, // 6 (offset 24, when no effects: start and len are zero)
nav: Nav, // 2 (offset 30, see ADR-0008)

// --- 32 bytes control flow ---
successor_count: u32, // 4
successor_data: [u32; 7], // 28
successor_data: [u32; 8], // 32 (offset 32)
}
// 64 bytes, align 64 (cache-line aligned)
```

Navigation is fully determined by `pre_nav`—no runtime dispatch based on previous matcher. See [ADR-0008](ADR-0008-tree-navigation.md) for `PreNav` definition and semantics.
Navigation is fully determined by `nav`—no runtime dispatch based on previous matcher. See [ADR-0008](ADR-0008-tree-navigation.md) for `Nav` definition and semantics.

Single `ref_marker` slot—sequences like `Enter(A) → Enter(B)` remain as epsilon chains.

Expand All @@ -62,18 +64,18 @@ Successors use a small-size optimization to avoid indirection for the common cas

| `successor_count` | Layout |
| ----------------- | ----------------------------------------------------------------------------------- |
| 0–7 | `successor_data[0..count]` contains `TransitionId` values directly |
| > 7 | `successor_data[0]` is index into `successors` segment, `successor_count` is length |
| 0–8 | `successor_data[0..count]` contains `TransitionId` values directly |
| > 8 | `successor_data[0]` is index into `successors` segment, `successor_count` is length |

Why 7 slots: 32 available bytes / 4 bytes per `TransitionId` = 8 slots, minus 1 for the count field leaves 7.
Why 8 slots: Moving `successor_count` into the metadata block frees 32 bytes for `successor_data`, giving 32 / 4 = 8 inline slots.

Coverage:

- Linear sequences: 1 successor
- Simple branches, quantifiers: 2 successors
- Most alternations: 2–7 branches
- Most alternations: 2–8 branches

Only massive alternations (8+ branches) spill to the external buffer.
Only massive alternations (9+ branches) spill to the external buffer.

Cache benefits:

Expand Down Expand Up @@ -104,7 +106,7 @@ enum Matcher {

`Option<NodeFieldId>` uses 0 for `None` (niche optimization).

Navigation (descend/ascend) is handled by `PreNav`, not matchers. Matchers are purely for node matching.
Navigation (descend/ascend) is handled by `Nav`, not matchers. Matchers are purely for node matching.

### RefTransition

Expand Down Expand Up @@ -167,20 +169,18 @@ enum EffectOp {
// 4 bytes, align 2
```

`CaptureNode` is explicit—graph construction places it at the correct position relative to container effects.

**Invariant**: The interpreter clears `matched_node` slot on `Enter` and backtrack restore. This prevents stale captures if a graph construction bug produces `Epsilon → CaptureNode` without a preceding `Match`. With proper graphs, `CaptureNode` always follows a successful match that populates the slot.
**Graph construction invariant**: `CaptureNode` may only appear in the effects list of a transition where `matcher` is `Node`, `Anonymous`, or `Wildcard`. Placing `CaptureNode` on an `Epsilon` transition is illegal—graph construction must enforce this at build time.

### View Types

```rust
struct TransitionView<'a> {
query_ir: &'a QueryIR,
query: &'a CompiledQuery,
raw: &'a Transition,
}

struct MatcherView<'a> {
query_ir: &'a QueryIR,
query: &'a CompiledQuery,
raw: &'a Matcher,
}

Expand All @@ -191,7 +191,7 @@ Views resolve `Slice<T>` to `&[T]`. `TransitionView::successors()` returns `&[Tr

### Quantifiers

Examples in this section show graph structure and effects. Navigation (`pre_nav`) is omitted for brevity—see [ADR-0008](ADR-0008-tree-navigation.md) for full transition examples with navigation.
Examples in this section show graph structure and effects. Navigation (`nav`) is omitted for brevity—see [ADR-0008](ADR-0008-tree-navigation.md) for full transition examples with navigation.

**Greedy `*`**:

Expand Down Expand Up @@ -228,8 +228,8 @@ Before elimination:
```
T0: ε [StartArray] → [T1]
T1: ε (branch) → [T2, T4]
T2: Match(identifier) → [T3]
T3: ε [CaptureNode, PushElement] → [T1]
T2: Match(identifier) [CaptureNode] → [T3]
T3: ε [PushElement] → [T1]
T4: ε [EndArray] → [T5]
T5: ε [Field("params")] → [...]
```
Expand Down Expand Up @@ -277,7 +277,7 @@ Partial—full elimination impossible due to single `ref_marker` and effect orde

**Execution order** (all transitions, including epsilon):

1. Execute `pre_nav` and matcher
1. Execute `nav` and matcher
2. On success: emit `effects` in order

With explicit `CaptureNode`, effect order is unambiguous. When eliminating epsilon chains, concatenate effect lists in traversal order.
Expand Down
Loading