diff --git a/docs/wip/bytecode-linking.md b/docs/wip/bytecode-linking.md deleted file mode 100644 index 1d4e5b37..00000000 --- a/docs/wip/bytecode-linking.md +++ /dev/null @@ -1,204 +0,0 @@ -# Bytecode Linking Design - -## Scope - -**Design** the bytecode format to support future post-compilation linking, but **implement only**: - -- Emit linked bytecode (current behavior) -- Emit unlinked bytecode (new: StringIds in instructions) -- CLI dumps both for debugging -- Runtime executes only linked bytecode, rejects unlinked - -**NOT implementing**: Actual relinking logic. The format supports it; we don't use it yet. - -## Chosen Design: Header Flag + StringId References - -**Key insight**: `StringId` and `NodeTypeId` are both `u16`. Instructions always store a `u16` in bytes 2-3 (node_type) and 4-5 (node_field). A header flag indicates how to interpret them. - -### StringId(0) Reservation - -Since `0` means "no constraint" (wildcard) in instruction bytes, `StringId(0)` can never be referenced by instructions. Reserve it as an easter egg: - -``` -strings[0] = "Beauty will save the world" // Dostoevsky, The Idiot -strings[1] = first actual string -strings[2] = second actual string -... -``` - -Actual string references use 1-based indices. The easter egg sits at index 0, visible to anyone who hexdumps the bytecode. - -### Unlinked Bytecode - -``` -Header: linked = false -Match instruction bytes 2-3: StringId (index into string table) -Match instruction bytes 4-5: StringId (index into string table) -node_types section: empty (reserved) -node_fields section: empty (reserved) -``` - -### Linked Bytecode (emitted via `LinkedQuery::emit()`) - -``` -Header: linked = true -Match instruction bytes 2-3: NodeTypeId (grammar ID) -Match instruction bytes 4-5: NodeFieldId (grammar ID) -node_types section: [(NodeTypeId, StringId), ...] for verification -node_fields section: [(NodeFieldId, StringId), ...] for verification -``` - -### Runtime Behavior - -- If `linked = false`: reject execution, require linking first -- If `linked = true`: execute directly, optionally verify symbol tables match loaded grammar - ---- - -## Current Architecture - -### Compilation Flow - -``` -Query → Parse → Analyze → [Link against grammar] → Compile → Emit bytecode - ↓ - node_type_ids: HashMap - node_field_ids: HashMap -``` - -### Key Structures - -**MatchIR** (`bytecode/ir.rs:74-93`): - -```rust -pub struct MatchIR { - pub node_type: Option, // Already numeric (tree-sitter ID) - pub node_field: Option, // Already numeric - pub successors: Vec