Skip to content

Commit 06d6a3c

Browse files
committed
refactor: dedup concatQMCBBatches via concatColumnarBatches (-79 lines) + docs for SQL semantics and error handling
concatQMCBBatches now delegates to concatColumnarBatches + encodeColumnarBatch instead of reimplementing the same per-dtype concat logic. All 470 tests pass. SQL docs: NULL three-valued logic, type coercion rules, operator precedence, comments. Error handling docs: QueryModeError codes, catching patterns, error wrapping.
1 parent 611d1ac commit 06d6a3c

File tree

4 files changed

+160
-79
lines changed

4 files changed

+160
-79
lines changed

docs/astro.config.mjs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ export default defineConfig({
2929
{ label: "Columnar Format", slug: "columnar-format" },
3030
{ label: "Lazy Evaluation", slug: "lazy-evaluation" },
3131
{ label: "Performance", slug: "performance" },
32+
{ label: "Error Handling", slug: "error-handling" },
3233
{ label: "Write Path", slug: "write-path" },
3334
{ label: "Deployment", slug: "deployment" },
3435
],
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
---
2+
title: Error Handling
3+
description: Structured errors with codes, causes, and actionable messages.
4+
---
5+
6+
All errors thrown by QueryMode are `QueryModeError` instances with a structured `code` field. Low-level errors (ENOENT, parse failures, OOM) are wrapped automatically with context.
7+
8+
## Error codes
9+
10+
| Code | When | Example message |
11+
|------|------|----------------|
12+
| `TABLE_NOT_FOUND` | File doesn't exist, R2 key missing | "Table not found: events.lance" |
13+
| `COLUMN_NOT_FOUND` | Column name not in schema | "Column not found: foo" |
14+
| `INVALID_FORMAT` | File can't be parsed as any supported format | "Invalid table format: data.xyz" |
15+
| `SCHEMA_MISMATCH` | Column exists but type doesn't match operation | "Column not found in events: age" |
16+
| `INVALID_FILTER` | Bad filter op or value type | "Invalid filter: unknown op 'regex'" |
17+
| `INVALID_AGGREGATE` | Bad aggregate function or missing column | "Invalid aggregate: sum requires numeric column" |
18+
| `MEMORY_EXCEEDED` | Operator exceeds memory budget | "Memory budget exceeded querying events" |
19+
| `NETWORK_TIMEOUT` | R2 or RPC call timed out | "Network timeout on events: R2 read timed out" |
20+
| `QUERY_TIMEOUT` | Total query time exceeded | "Query timeout on events" |
21+
| `QUERY_FAILED` | Catch-all for unclassified errors | "Query failed on events: ..." |
22+
23+
## Catching errors
24+
25+
```typescript
26+
import { QueryModeError } from "querymode"
27+
28+
try {
29+
await qm.table("missing.lance").collect()
30+
} catch (err) {
31+
if (err instanceof QueryModeError) {
32+
switch (err.code) {
33+
case "TABLE_NOT_FOUND":
34+
console.log("Table doesn't exist:", err.message)
35+
break
36+
case "MEMORY_EXCEEDED":
37+
console.log("Try adding filters or reducing projections")
38+
break
39+
default:
40+
console.log(`${err.code}: ${err.message}`)
41+
}
42+
// Original error is preserved
43+
if (err.cause) console.log("Caused by:", err.cause)
44+
}
45+
}
46+
```
47+
48+
## Error wrapping
49+
50+
`QueryModeError.from()` wraps any error with context:
51+
52+
```typescript
53+
try {
54+
await riskyOperation()
55+
} catch (err) {
56+
throw QueryModeError.from(err, { table: "events", operation: "scan" })
57+
// Automatically classifies: ENOENT → TABLE_NOT_FOUND,
58+
// "footer" in message → INVALID_FORMAT, "OOM" → MEMORY_EXCEEDED, etc.
59+
}
60+
```
61+
62+
Already-wrapped `QueryModeError` instances pass through unchanged.
63+
64+
## SQL parse errors
65+
66+
SQL syntax errors throw standard `Error` (not `QueryModeError`) with the parse position:
67+
68+
```typescript
69+
try {
70+
await qm.sql("SELECT FROM").collect()
71+
} catch (err) {
72+
// "Expected column or expression at position 7"
73+
console.log(err.message)
74+
}
75+
```

docs/src/content/docs/sql.mdx

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,88 @@ SELECT * FROM images WHERE embedding NEAR [0.1, 0.2, 0.3] TOPK 10
123123

124124
The `NEAR` operator performs vector similarity search on the specified column. `TOPK` limits results to the K nearest neighbors. Uses IVF-PQ index when available, falls back to flat SIMD scan.
125125

126+
## NULL semantics
127+
128+
QueryMode follows SQL three-valued logic. Expressions involving NULL propagate NULL rather than returning true or false:
129+
130+
```sql
131+
-- NULL comparisons → NULL (row excluded from results)
132+
SELECT * FROM users WHERE age = NULL -- no rows (use IS NULL instead)
133+
SELECT * FROM users WHERE NULL > 5 -- no rows
134+
135+
-- AND with NULL
136+
SELECT * FROM t WHERE NULL AND true -- NULL (excluded)
137+
SELECT * FROM t WHERE NULL AND false -- false (excluded)
138+
139+
-- OR with NULL
140+
SELECT * FROM t WHERE NULL OR true -- true (included)
141+
SELECT * FROM t WHERE NULL OR false -- NULL (excluded)
142+
143+
-- NOT IN with NULL elements
144+
SELECT * FROM t WHERE id NOT IN (1, 2, NULL) -- NULL for all rows (per SQL standard)
145+
146+
-- BETWEEN with NULL
147+
SELECT * FROM t WHERE NULL BETWEEN 1 AND 10 -- NULL (excluded)
148+
149+
-- IS NULL / IS NOT NULL (never return NULL)
150+
SELECT * FROM t WHERE email IS NULL -- works correctly
151+
```
152+
153+
Aggregates also follow SQL NULL rules:
154+
155+
| Expression | Result |
156+
|-----------|--------|
157+
| `SUM(col)` where all values are NULL | NULL |
158+
| `COUNT(col)` where all values are NULL | 0 |
159+
| `COUNT(*)` | counts all rows (ignores NULLs) |
160+
| `MIN(col)` / `MAX(col)` on empty group | NULL |
161+
| `AVG(col)` where all values are NULL | NULL |
162+
163+
## Type coercion
164+
165+
Comparisons between different types follow these rules:
166+
167+
| Left type | Right type | Behavior |
168+
|-----------|------------|----------|
169+
| number | number | Direct comparison |
170+
| bigint | number | number promoted to bigint (when integer) |
171+
| string | number | Numeric comparison if string is numeric, else string comparison |
172+
| any | NULL | Result is NULL |
173+
174+
`CAST` converts between types explicitly:
175+
176+
```sql
177+
SELECT CAST(age AS text) AS age_str -- number → string
178+
SELECT CAST('42' AS int) AS age -- string → number
179+
SELECT CAST(id AS bigint) AS big_id -- number → bigint
180+
```
181+
182+
## Operator precedence
183+
184+
From highest to lowest:
185+
186+
1. Parentheses `()`
187+
2. Unary `NOT`, `-`
188+
3. Multiplication `*`, Division `/`, Modulo `%`
189+
4. Addition `+`, Subtraction `-`
190+
5. Comparison `=`, `!=`, `<>`, `<`, `>`, `<=`, `>=`
191+
6. `IS NULL`, `IS NOT NULL`, `BETWEEN`, `IN`, `LIKE`
192+
7. `AND`
193+
8. `OR`
194+
195+
```sql
196+
-- AND binds tighter than OR
197+
WHERE a = 1 OR b = 2 AND c = 3
198+
-- is parsed as: WHERE a = 1 OR (b = 2 AND c = 3)
199+
```
200+
201+
## Comments
202+
203+
```sql
204+
-- Line comment (ignored)
205+
SELECT * FROM users /* block comment */ WHERE age > 25
206+
```
207+
126208
## How it works
127209

128210
```

src/columnar.ts

Lines changed: 2 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -420,85 +420,8 @@ export function concatQMCBBatches(batches: ArrayBuffer[]): ArrayBuffer | null {
420420
const decoded = batches.map(b => decodeColumnarBatch(b)).filter((b): b is ColumnarBatch => b !== null);
421421
if (decoded.length === 0) return null;
422422

423-
const totalRows = decoded.reduce((s, b) => s + b.rowCount, 0);
424-
const numCols = decoded[0].columns.length;
425-
426-
const columns: ColumnarColumn[] = [];
427-
428-
for (let ci = 0; ci < numCols; ci++) {
429-
const dtype = decoded[0].columns[ci].dtype;
430-
const name = decoded[0].columns[ci].name;
431-
const bpe = bytesPerElement(dtype);
432-
433-
if (bpe > 0) {
434-
// Fixed-width numeric: memcpy concat
435-
const buf = new ArrayBuffer(totalRows * bpe);
436-
const out = new Uint8Array(buf);
437-
let offset = 0;
438-
for (const batch of decoded) {
439-
const src = new Uint8Array(batch.columns[ci].data);
440-
out.set(src, offset);
441-
offset += src.length;
442-
}
443-
columns.push({ name, dtype, data: buf, rowCount: totalRows });
444-
} else if (dtype === DTYPE_BOOL) {
445-
const buf = new ArrayBuffer(Math.ceil(totalRows / 8));
446-
const out = new Uint8Array(buf);
447-
let row = 0;
448-
for (const batch of decoded) {
449-
const src = new Uint8Array(batch.columns[ci].data);
450-
for (let r = 0; r < batch.rowCount; r++) {
451-
if (src[r >> 3] & (1 << (r & 7))) out[row >> 3] |= 1 << (row & 7);
452-
row++;
453-
}
454-
}
455-
columns.push({ name, dtype, data: buf, rowCount: totalRows });
456-
} else if (dtype === DTYPE_UTF8) {
457-
let totalStrLen = 0;
458-
for (const batch of decoded) {
459-
const col = batch.columns[ci];
460-
totalStrLen += col.offsets ? col.offsets[batch.rowCount] : col.data.byteLength;
461-
}
462-
463-
const offsets = new Uint32Array(totalRows + 1);
464-
const strBuf = new Uint8Array(totalStrLen);
465-
let strOffset = 0;
466-
let row = 0;
467-
468-
for (const batch of decoded) {
469-
const col = batch.columns[ci];
470-
const srcOffsets = col.offsets!;
471-
const srcData = new Uint8Array(col.data);
472-
for (let r = 0; r < batch.rowCount; r++) {
473-
offsets[row] = strOffset;
474-
const start = srcOffsets[r];
475-
const end = srcOffsets[r + 1];
476-
if (end > start) {
477-
strBuf.set(srcData.subarray(start, end), strOffset);
478-
strOffset += end - start;
479-
}
480-
row++;
481-
}
482-
}
483-
offsets[totalRows] = strOffset;
484-
columns.push({ name, dtype, data: (strBuf.buffer as ArrayBuffer).slice(0, strOffset), rowCount: totalRows, offsets });
485-
} else if (dtype === DTYPE_F32VEC) {
486-
const dim = decoded[0].columns[ci].vectorDim || 0;
487-
const buf = new ArrayBuffer(totalRows * dim * 4);
488-
const out = new Uint8Array(buf);
489-
let offset = 0;
490-
for (const batch of decoded) {
491-
const src = new Uint8Array(batch.columns[ci].data);
492-
out.set(src, offset);
493-
offset += src.length;
494-
}
495-
columns.push({ name, dtype, data: buf, rowCount: totalRows, vectorDim: dim });
496-
} else {
497-
columns.push({ name, dtype, data: new ArrayBuffer(0), rowCount: totalRows });
498-
}
499-
}
500-
501-
return encodeColumnarBatch({ columns, rowCount: totalRows });
423+
const merged = concatColumnarBatches(decoded);
424+
return merged ? encodeColumnarBatch(merged) : null;
502425
}
503426

504427
// ============================================================================

0 commit comments

Comments
 (0)