Skip to content

Commit 176a611

Browse files
committed
feat: push compute to WASM — SIMD filters, late materialization, safety locks
Engine-level improvements to close the gap between the Zig WASM engine and the TypeScript pipeline: ScanOperator (src/operators.ts): - Apply filters during scan via WASM SIMD before row materialization - Late materialization: decode filter columns first, skip projection column decode when all rows are filtered out - buildPipeline skips FilterOperator when scan handles filters Parquet bounded path (src/query-do.ts): - Register decoded Parquet columns in WASM and use executeQuery() (same SIMD path as Lance) instead of JS row-by-row filtering WasmEngine (src/wasm-engine.ts): - Add registerDecodedColumn() — converts JS decoded arrays to typed arrays and registers in WASM for SQL execution SIMD128 filters (wasm/src/wasm/aggregates.zig): - filterFloat64Buffer: @vector(2, f64) — 2 values per cycle - filterInt32Buffer: @vector(4, i32) — 4 values per cycle - intersectIndices: O(n+m) sorted merge (was O(n*m)) VIP cache safety locks (src/vip-cache.ts): - acquire()/release() reference counting prevents in-use eviction - evict() skips locked entries; map grows temporarily if all locked
1 parent de7c2e0 commit 176a611

File tree

6 files changed

+578
-41
lines changed

6 files changed

+578
-41
lines changed

research/zig-engine-roadmap.md

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# Zig Engine Roadmap
2+
3+
Learnings from sibling Zig repos, prioritized by impact on QueryMode's WASM engine.
4+
5+
## P0: Selection Vectors + Late Materialization (from lanceql) — PARTIALLY DONE
6+
7+
**Source:** `../lanceql/src/sql/late_materialization.zig`, `../lanceql/src/query/vector_engine.zig`
8+
9+
**Already exists in Zig:** `wasm/src/query/vector_engine.zig` has SelectionVector, DataChunk, Vector types (DuckDB-style, VECTOR_SIZE=2048). `wasm/src/columnar_ops.zig` has SIMD filter ops returning row indices.
10+
11+
**Done (TS layer):**
12+
- ScanOperator now applies filters during scan using WASM SIMD (`filterFloat64Buffer`/`filterInt32Buffer`) before row materialization
13+
- `buildPipeline` skips FilterOperator when ScanOperator handles filters
14+
- Parquet bounded path registers decoded columns in WASM and uses `executeQuery()` (SIMD filter/sort/agg) instead of JS row-by-row
15+
16+
**Remaining:**
17+
- True two-phase execution: decode only filter columns first, get matching indices, then decode projection columns only for matches (saves string decode cost)
18+
- Connect TS pipeline to Zig SelectionVector/DataChunk types for full columnar execution
19+
20+
**Impact:** Peak memory drops from ~128MB to ~12MB on 1M row queries.
21+
22+
**Files modified:** `src/operators.ts`, `src/wasm-engine.ts`, `src/query-do.ts`
23+
24+
## P1: SIMD128 Filter Predicates (from vectorjson + edgebox) — DONE (numeric)
25+
26+
**Source:** `../vectorjson/src/zig/simd.zig`, `../edgebox/src/simd_utils.zig`
27+
28+
**Done:**
29+
- `filterFloat64Buffer`: SIMD128 with @Vector(2, f64) — 2 f64 per cycle + scalar tail
30+
- `filterInt32Buffer`: SIMD128 with @Vector(4, i32) — 4 i32 per cycle + scalar tail
31+
- `intersectIndices`: O(n+m) sorted merge (was O(n*m) nested loop)
32+
33+
**Remaining:**
34+
- Comptime `anyMatch` pattern for string column scanning
35+
- SIMD null bitmap evaluation
36+
37+
**Files modified:** `wasm/src/wasm/aggregates.zig`
38+
39+
## P1: Arena Allocator Per Batch (from edgebox) — ALREADY SOLVED
40+
41+
**Source:** `../edgebox/src/native_arena.zig`
42+
43+
**Status:** Already effectively solved. WASM engine uses `std.heap.WasmAllocator` (bump allocator — linear memory, no free, no fragmentation). TS calls `resetHeap()` between queries. This is equivalent to arena-per-query. See `wasm/src/wasm/memory.zig`.
44+
45+
**No action needed.**
46+
47+
## P2: Vectorized WHERE Evaluation (from lanceql) — PARTIALLY DONE
48+
49+
**Source:** `../lanceql/src/sql/where_eval.zig`
50+
51+
**Done:** TS `scanFilterIndices()` handles compound AND (intersect index arrays via WASM `intersectIndices`). WASM SQL path (`executeSql`) already evaluates WHERE vectorized in Zig.
52+
53+
**Remaining:**
54+
- OR support in TS scan-time filter (union index arrays via `unionIndices`)
55+
- Short-circuit evaluation: if first AND filter returns 0 matches, skip remaining filters
56+
- Complex expressions (BETWEEN, LIKE) in the WASM filter fast path
57+
58+
**Files modified:** `src/operators.ts` (scanFilterIndices)
59+
60+
## P2: VIP Pinning with Safety Locks (from zell) — DONE
61+
62+
**Source:** `../zell/src/expert_cache.zig`
63+
64+
**Done:** Added acquire/release reference counting to VipCache:
65+
- `acquire(key)` — like get() but increments refCount, prevents eviction
66+
- `release(key)` — decrements refCount, deletes if pending eviction and refCount=0
67+
- `evict()` skips entries with refCount > 0; lets map grow temporarily if all locked
68+
- `stats()` includes `lockedCount` (entries with refCount > 0)
69+
70+
**Files modified:** `src/vip-cache.ts`
71+
72+
## P3: Host Import Pattern for R2 I/O (from gitmode)
73+
74+
**Source:** `../gitmode/wasm/src/r2_backend.zig`, `../gitmode/wasm/src/main.zig`
75+
76+
**Problem:** WASM engine currently receives data pushed from TypeScript.
77+
78+
**Solution:** WASM calls host-imported functions to request R2 reads:
79+
- `extern fn r2_read(key_ptr: [*]u8, key_len: u32, offset: u64, len: u32) i32`
80+
- WASM engine drives its own I/O, enabling prefetch decisions inside Zig
81+
82+
**Files to modify:** `wasm/src/main.zig`, `src/wasm-engine.ts`
83+
84+
## P3: Comptime Type Marshaling (from metal0)
85+
86+
**Source:** `../metal0/packages/c_interop/src/comptime_wrapper.zig`
87+
88+
**Problem:** Format-specific decoders have repetitive type conversion code.
89+
90+
**Solution:** Use Zig comptime to auto-generate type converters:
91+
```zig
92+
fn MarshalColumn(comptime T: type) type {
93+
return struct {
94+
pub fn decode(buf: []const u8) []T { ... }
95+
pub fn encode(values: []const T) []u8 { ... }
96+
};
97+
}
98+
```
99+
100+
Generate Arrow<->Lance<->Parquet converters from one template.
101+
102+
**Files to modify:** `wasm/src/decode.zig`
103+
104+
## P3: Canonical ABI for WASM Boundary (from edgebox)
105+
106+
**Source:** `../edgebox/src/component/canonical_abi.zig`
107+
108+
**Problem:** Column data exchange between TS and WASM uses manual pointer math.
109+
110+
**Solution:** Type-safe lift/lower functions:
111+
- Lower (Host->WASM): allocate in WASM memory, copy column data
112+
- Lift (WASM->Host): validate, copy to host
113+
- Handles strings, lists, nested types
114+
115+
**Files to modify:** `src/wasm-engine.ts`, `wasm/src/main.zig`
116+
117+
## Reference: Key Files in Sibling Repos
118+
119+
| Repo | File | What to learn |
120+
|------|------|---------------|
121+
| lanceql | `src/sql/late_materialization.zig` | Two-phase execution, streaming batches |
122+
| lanceql | `src/query/vector_engine.zig` | SelectionVector, DataChunk, Vector types |
123+
| lanceql | `src/sql/where_eval.zig` | Vectorized compound filter evaluation |
124+
| lanceql | `src/simd.zig` | Threshold-based SIMD dispatch |
125+
| vectorjson | `src/zig/simd.zig` | Comptime SIMD128 anyMatch pattern |
126+
| edgebox | `src/native_arena.zig` | Bump allocator with LIFO + in-place realloc |
127+
| edgebox | `src/simd_utils.zig` | SIMD byte scanning patterns |
128+
| gitmode | `wasm/src/r2_backend.zig` | Host import R2 I/O from WASM |
129+
| gitmode | `wasm/src/simd.zig` | SIMD128 memchr/memeql |
130+
| zell | `src/expert_cache.zig` | VIP pinning + LRU + safety locks |
131+
| metal0 | `packages/c_interop/src/comptime_wrapper.zig` | Comptime code generation |

0 commit comments

Comments
 (0)