Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
- [ADR-0004: Query IR Binary Format](docs/adr/ADR-0004-query-ir-binary-format.md)
- [ADR-0005: Transition Graph Format](docs/adr/ADR-0005-transition-graph-format.md)
- [ADR-0006: Dynamic Query Execution](docs/adr/ADR-0006-dynamic-query-execution.md)
- [ADR-0007: Type Metadata Format](docs/adr/ADR-0007-type-metadata-format.md)
- **Template**:

```markdown
Expand Down Expand Up @@ -48,7 +49,7 @@
ADRs must be succint and straight to the point.
They must contain examples with high information density and pedagogical value.
These are docs people usually don't want to read, but when they do, they find it quite fascinating.
Avoid imperative code, describe structure definitions, their purpose and how to use them properly.
Don't write imperative code, describe structure definitions, their purpose and how to use them properly (and how to NOT use).

# Plotnik Query Language

Expand Down
68 changes: 46 additions & 22 deletions docs/adr/ADR-0004-query-ir-binary-format.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

## Context

The Query IR lives in a single contiguous allocation—cache-friendly, zero fragmentation, portable to WASM. This ADR defines the binary layout. Graph structures are in [ADR-0005](ADR-0005-transition-graph-format.md).
The Query IR lives in a single contiguous allocation—cache-friendly, zero fragmentation, portable to WASM. This ADR defines the binary layout. Graph structures are in [ADR-0005](ADR-0005-transition-graph-format.md). Type metadata is in [ADR-0007](ADR-0007-type-metadata-format.md).

## Decision

Expand All @@ -20,7 +20,8 @@ struct QueryIR {
negated_fields_offset: u32,
string_refs_offset: u32,
string_bytes_offset: u32,
type_info_offset: u32,
type_defs_offset: u32,
type_members_offset: u32,
entrypoints_offset: u32,
}
```
Expand All @@ -40,6 +41,8 @@ struct QueryIRBuffer {

Allocated via `Layout::from_size_align(len, BUFFER_ALIGN)`. Standard `Box<[u8]>` won't work—it assumes 1-byte alignment and corrupts `dealloc`. The 64-byte alignment ensures transitions never straddle cache lines.

**Deallocation**: `QueryIRBuffer` must implement `Drop` to reconstruct the exact `Layout` (size + 64-byte alignment) and call `std::alloc::dealloc`. Using `Box::from_raw` or similar would assume align=1 and cause undefined behavior.

### Segments

| Segment | Type | Offset | Align |
Expand All @@ -50,49 +53,68 @@ Allocated via `Layout::from_size_align(len, BUFFER_ALIGN)`. Standard `Box<[u8]>`
| Negated Fields | `[NodeFieldId; Q]` | `negated_fields_offset` | 2 |
| String Refs | `[StringRef; R]` | `string_refs_offset` | 4 |
| String Bytes | `[u8; S]` | `string_bytes_offset` | 1 |
| Type Info | `[TypeInfo; U]` | `type_info_offset` | 4 |
| Entrypoints | `[Entrypoint; T]` | `entrypoints_offset` | 4 |
| Type Defs | `[TypeDef; T]` | `type_defs_offset` | 4 |
| Type Members | `[TypeMember; U]` | `type_members_offset` | 2 |
| Entrypoints | `[Entrypoint; V]` | `entrypoints_offset` | 4 |

Each offset is aligned: `(offset + align - 1) & !(align - 1)`.

### Stringsi
For `Transition`, `EffectOp` see [ADR-0005](ADR-0005-transition-graph-format.md). For `TypeDef`, `TypeMember` see [ADR-0007](ADR-0007-type-metadata-format.md).

### Strings

Single pool for all strings (field names, variant tags, entrypoint names):
Single pool for all strings (field names, variant tags, entrypoint names, type names):

```rust
type StringId = u16;

#[repr(C)]
struct StringRef {
offset: u32, // into string_bytes
len: u16,
_pad: u16,
}
// 8 bytes, align 4

#[repr(C)]
struct Entrypoint {
name_id: u16, // into string_refs
_pad: u16,
target: TransitionId,
}
type DataFieldId = StringId; // field names in effects
type VariantTagId = StringId; // variant tags in effects

type TypeId = u16; // see ADR-0007 for semantics
```

`DataFieldId(u16)` and `VariantTagId(u16)` index into `string_refs`. Distinct types, same table.
`StringId` indexes into `string_refs`. `DataFieldId` and `VariantTagId` are aliases for type safety. `TypeId` indexes into type_defs (with reserved primitives 0-2).

Strings are interned during construction—identical strings share storage and ID.

### Entrypoints

```rust
#[repr(C)]
struct Entrypoint {
name_id: StringId, // 2
_pad: u16, // 2
target: TransitionId, // 4
result_type: TypeId, // 2 - see ADR-0007
_pad2: u16, // 2
}
// 12 bytes, align 4
```

### Serialization

```
Header (44 bytes):
magic: [u8; 4] b"PLNK"
version: u32 format version + ABI hash
checksum: u32 CRC32(offsets || buffer_data)
Header (48 bytes):
magic: [u8; 4] b"PLNK"
version: u32 format version + ABI hash
checksum: u32 CRC32(offsets || buffer_data)
buffer_len: u32
successors_offset: u32
effects_offset: u32
negated_fields_offset: u32
string_refs_offset: u32
string_bytes_offset: u32
type_info_offset: u32
type_defs_offset: u32
type_members_offset: u32
entrypoints_offset: u32

Buffer Data (buffer_len bytes)
Expand All @@ -104,7 +126,7 @@ Little-endian always. UTF-8 strings. Version mismatch or checksum failure → re

Three passes:

1. **Analysis**: Count elements, intern strings
1. **Analysis**: Count elements, intern strings, infer types
2. **Layout**: Compute aligned offsets, allocate once
3. **Emission**: Write via `ptr::write`

Expand All @@ -128,15 +150,16 @@ Buffer layout:
0x0280 Negated Fields []
0x0280 String Refs [{0,4}, {4,5}, {9,5}, ...]
0x02C0 String Bytes "namevalueIdentNumFuncExpr"
0x0300 Type Info [...]
0x0340 Entrypoints [{4, T0}, {5, T3}]
0x0300 Type Defs [Record{...}, Enum{...}, ...]
0x0340 Type Members [{name,Str}, {Ident,Ty5}, ...]
0x0380 Entrypoints [{name=Func, target=Tr0, type=Ty3}, ...]
```

`"name"` stored once, used by both `@name` captures.

## Consequences

**Positive**: Cache-efficient, O(1) string lookup, zero-copy access, simple validation.
**Positive**: Cache-efficient, O(1) string lookup, zero-copy access, simple validation. Self-contained binaries enable query caching by input hash.

**Negative**: Format changes require rebuild. No version migration.

Expand All @@ -146,3 +169,4 @@ Buffer layout:

- [ADR-0005: Transition Graph Format](ADR-0005-transition-graph-format.md)
- [ADR-0006: Dynamic Query Execution](ADR-0006-dynamic-query-execution.md)
- [ADR-0007: Type Metadata Format](ADR-0007-type-metadata-format.md)
20 changes: 14 additions & 6 deletions docs/adr/ADR-0005-transition-graph-format.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,8 @@ Edge-centric IR: transitions carry all semantics (matching, effects, successors)
type TransitionId = u32;
type NodeTypeId = u16; // from tree-sitter, do not change
type NodeFieldId = NonZeroU16; // from tree-sitter, Option uses 0 for None
type DataFieldId = u16;
type VariantTagId = u16;
type RefId = u16;
// StringId, DataFieldId, VariantTagId: see ADR-0004
```

### Slice
Expand Down Expand Up @@ -61,10 +60,10 @@ Single `ref_marker` slot—sequences like `Enter(A) → Enter(B)` remain as epsi

Successors use a small-size optimization to avoid indirection for the common case:

| `successor_count` | Layout |
| ----------------- | ------------------------------------------------------------------------------------ |
| 0–5 | `successor_data[0..count]` contains `TransitionId` values directly |
| > 5 | `successor_data[0]` is offset into `successors` segment, `successor_count` is length |
| `successor_count` | Layout |
| ----------------- | ----------------------------------------------------------------------------------- |
| 0–5 | `successor_data[0..count]` contains `TransitionId` values directly |
| > 5 | `successor_data[0]` is index into `successors` segment, `successor_count` is length |

Why 5 slots: 24 available bytes / 4 bytes per `TransitionId` = 6 slots, minus 1 for the count field leaves 5.

Expand Down Expand Up @@ -281,6 +280,14 @@ T6: ε + Field("val") + EndVariant → [T7]

Partial—full elimination impossible due to single `ref_marker`.

**Execution order** (all transitions, including epsilon):

1. Emit `pre_effects`
2. Execute matcher (epsilon always succeeds)
3. On success: emit implicit `CaptureNode`, emit `post_effects`

An epsilon transition with `pre: [StartObject]` and `post: [EndObject]` legitimately creates an empty object. To avoid accidental empty structures in graph rewrites, move effects to the destination's `pre` or source's `post` as appropriate.

Why pre/post split matters:

```
Expand Down Expand Up @@ -308,3 +315,4 @@ Incoming epsilon effects → `pre_effects`. Outgoing → `post_effects`.

- [ADR-0004: Query IR Binary Format](ADR-0004-query-ir-binary-format.md)
- [ADR-0006: Dynamic Query Execution](ADR-0006-dynamic-query-execution.md)
- [ADR-0007: Type Metadata Format](ADR-0007-type-metadata-format.md)
16 changes: 14 additions & 2 deletions docs/adr/ADR-0006-dynamic-query-execution.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,10 +99,12 @@ struct BacktrackPoint {
cursor_checkpoint: u32, // tree-sitter descendant_index
effect_watermark: u32,
recursion_frame: Option<u32>, // saved frame index
alternatives: Slice<TransitionId>,
alternatives: Slice<TransitionId>, // view into IR successors, not owned
}
```

`alternatives` references the IR's successor data (inline or spilled)—no runtime allocation per backtrack point.

| Operation | Action |
| --------- | ------------------------------------------------------ |
| Save | `cursor_checkpoint = cursor.descendant_index()` — O(1) |
Expand All @@ -129,7 +131,16 @@ struct CallFrame {
}
```

**Append-only invariant**: Frames are never removed. On `Exit`, set `current` to parent index. Backtracking restores `current`; the original frame is still accessible via its index.
**Append-only invariant**: Frames persist for backtracking correctness. On `Exit`, set `current` to parent index. Backtracking restores `current`; the original frame is still accessible via its index.

**Frame pruning**: After `Exit`, frames at the stack top may be reclaimed if:

1. Not the current frame (already exited)
2. Not referenced by any live backtrack point

This bounds memory by `max(recursion_depth, backtrack_depth)` rather than total call count. Without pruning, `(Rule)*` over N items allocates N frames; with pruning, it remains O(1) for non-backtracking iteration.

The `BacktrackPoint.recursion_frame` field establishes a "high-water mark"—the minimum frame index that must be preserved. Frames above this mark with no active reference can be popped.

| Operation | Action |
| ----------------- | ------------------------------------------------------------------------------ |
Expand Down Expand Up @@ -196,3 +207,4 @@ Details deferred.

- [ADR-0004: Query IR Binary Format](ADR-0004-query-ir-binary-format.md)
- [ADR-0005: Transition Graph Format](ADR-0005-transition-graph-format.md)
- [ADR-0007: Type Metadata Format](ADR-0007-type-metadata-format.md)
Loading