Skip to content

Commit 88a480d

Browse files
committed
fix: address all P0-P2 issues from dogfooding audit
P0 fixes (blockers): - Add whereNotNull()/whereNull() with is_null/is_not_null filter ops - Fix join() signature in docs (positional → object keys) - Fix install instructions (git clone, not pnpm add) - Document Zig requirement for WASM rebuild P1 fixes (friction): - Fix compute() → computed() in docs - Fix unionAll() → union(df, true) in docs - Fix head() return type in docs (QueryResult → Row[]) - Validate negative offset() values P2 fixes (paper cuts): - Add orderBy() alias for sort() - Add drop() method (inverse of select) - Add rename() method for column aliasing - Expand error codes: SCHEMA_MISMATCH, MEMORY_EXCEEDED, NETWORK_TIMEOUT, QUERY_TIMEOUT, INVALID_FILTER, INVALID_AGGREGATE - Add test:quick script (~2s vs ~8min full suite) - Update query-schema.ts to accept new filter ops - Fix example/local-quickstart.ts to use QueryMode API All 176 tests pass.
1 parent e358c26 commit 88a480d

File tree

12 files changed

+131
-26
lines changed

12 files changed

+131
-26
lines changed

README.md

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,9 @@
55
## Quickstart
66

77
```bash
8-
pnpm add querymode
8+
# Clone and use from source (not yet published to npm)
9+
git clone https://github.com/teamchong/querymode.git
10+
cd querymode && pnpm install
911
```
1012

1113
```typescript
@@ -238,7 +240,8 @@ npx tsx examples/nextjs-api-route.ts
238240

239241
- No deployed instance
240242
- No browser mode
241-
- No npm package published
243+
- No npm package published (install from source via git clone)
244+
- No SQL mode (planned — SQL frontend compiling to operator pipeline)
242245

243246
## Architecture
244247
![querymode-architecture](docs/architecture/querymode-architecture.svg)
@@ -247,11 +250,14 @@ npx tsx examples/nextjs-api-route.ts
247250

248251
```bash
249252
pnpm install # install dependencies
250-
pnpm build # typecheck (tsc)
251-
pnpm test # run vitest
253+
pnpm build:ts # typecheck only (no WASM rebuild needed — pre-built WASM included)
254+
pnpm test:node # run node tests (~2 min)
255+
pnpm test:workers # run workerd tests
256+
pnpm test # run all tests (~8 min)
252257
pnpm dev # local dev with wrangler
253258

254-
# build WASM from Zig source (requires zig)
259+
# Rebuild WASM from Zig source (requires zig toolchain)
260+
# Install: https://ziglang.org/download/
255261
pnpm wasm # cd wasm && zig build wasm && cp to src/wasm/
256262
```
257263

docs/src/content/docs/dataframe-api.mdx

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ df.filter("age", "gt", 25)
1212
df.filter("status", "eq", "active")
1313
df.filter("id", "in", [1, 2, 3])
1414
df.whereNotNull("email")
15+
df.whereNull("deleted_at")
1516
```
1617

1718
| Operator | Description |
@@ -23,18 +24,22 @@ df.whereNotNull("email")
2324
| `lt` | Less than |
2425
| `lte` | Less than or equal |
2526
| `in` | Membership in array |
27+
| `is_null` | Value is null |
28+
| `is_not_null` | Value is not null |
2629

2730
## Projection
2831

2932
```typescript
3033
df.select("id", "name", "email")
34+
df.drop("internal_id", "debug_flag") // exclude columns
35+
df.rename({ user_id: "userId", created_at: "createdAt" })
3136
```
3237

3338
## Sorting
3439

3540
```typescript
3641
df.sort("amount", "desc")
37-
df.sort("name", "asc")
42+
df.orderBy("name", "asc") // alias for sort()
3843
```
3944

4045
## Limiting
@@ -62,8 +67,8 @@ Supported functions: `sum`, `avg`, `min`, `max`, `count`, `count_distinct`, `std
6267
const orders = qm.table("orders")
6368
const users = qm.table("users")
6469

65-
orders.join(users, "user_id", "id", "inner")
66-
orders.join(users, "user_id", "id", "left")
70+
orders.join(users, { left: "user_id", right: "id" }, "inner")
71+
orders.join(users, { left: "user_id", right: "id" }, "left")
6772
```
6873

6974
Join types: `inner`, `left`, `right`, `full`, `cross`.
@@ -86,14 +91,14 @@ Supported: `row_number`, `rank`, `dense_rank`, `lag`, `lead`, rolling `sum`/`avg
8691
## Computed columns
8792

8893
```typescript
89-
df.compute("total", (row) => (row.price as number) * (row.qty as number))
94+
df.computed("total", (row) => (row.price as number) * (row.qty as number))
9095
```
9196

9297
## Set operations
9398

9499
```typescript
95-
df1.union(df2)
96-
df1.unionAll(df2)
100+
df1.union(df2) // UNION (deduplicates)
101+
df1.union(df2, true) // UNION ALL (keeps duplicates)
97102
df1.intersect(df2)
98103
df1.except(df2)
99104
```
@@ -117,7 +122,7 @@ These execute the query and return results:
117122
| `exists()` | `boolean` | Check if any rows match |
118123
| `explain()` | `ExplainResult` | Show execution plan without running |
119124
| `describe()` | Table schema | Column names, types, row counts |
120-
| `head(n)` | `QueryResult` | First N rows (shorthand for limit + collect) |
125+
| `head(n)` | `Row[]` | First N rows (shorthand for limit + collect) |
121126

122127
## Progress tracking
123128

docs/src/content/docs/getting-started.mdx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,8 @@ description: Install querymode and run your first query.
66
## Installation
77

88
```bash
9-
pnpm add querymode
9+
git clone https://github.com/teamchong/querymode.git
10+
cd querymode && pnpm install
1011
```
1112

1213
## Zero-config demo

examples/local-quickstart.ts

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,13 @@
44
* Reads a Lance or Parquet file from disk, applies filter + sort + limit,
55
* and prints the result. Replace the path with your own data file.
66
*/
7-
import { LocalExecutor } from "../src/local-executor.js";
8-
import { DataFrame } from "../src/client.js";
7+
import { QueryMode } from "../src/local.js";
98

10-
const executor = new LocalExecutor();
119
const TABLE = process.argv[2] ?? "./data/events.parquet";
1210

13-
const df = new DataFrame(TABLE, executor);
14-
15-
const result = await df
11+
const qm = QueryMode.local();
12+
const result = await qm
13+
.table(TABLE)
1614
.filter("amount", "gt", 100)
1715
.whereNotNull("region")
1816
.select("id", "amount", "region")

package.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
"test": "vitest run -c vitest.workers.config.ts && vitest run",
2525
"test:workers": "vitest run -c vitest.workers.config.ts",
2626
"test:node": "vitest run",
27+
"test:quick": "vitest run -c vitest.workers.config.ts src/convenience.test.ts src/format.test.ts src/footer.test.ts src/decode.test.ts src/merge.test.ts src/partial-agg.test.ts src/vip-cache.test.ts src/page-processor.test.ts",
2728
"test:watch": "vitest",
2829
"deploy": "pnpm run build && wrangler deploy",
2930
"wasm": "pnpm run build:wasm",

src/client.ts

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -163,16 +163,61 @@ export class DataFrame<T extends Row = Row> {
163163
return this.where(column, op, value);
164164
}
165165

166+
/** Filter rows where a column is not null. */
167+
whereNotNull(column: string): DataFrame<T> {
168+
return this.derive({ filters: [...this._filters, { column, op: "is_not_null", value: 0 }] });
169+
}
170+
171+
/** Filter rows where a column is null. */
172+
whereNull(column: string): DataFrame<T> {
173+
return this.derive({ filters: [...this._filters, { column, op: "is_null", value: 0 }] });
174+
}
175+
166176
/** Select specific columns. Only these byte ranges are fetched from R2. */
167177
select(...columns: string[]): DataFrame {
168178
return this.derive({ projections: columns }) as DataFrame;
169179
}
170180

181+
/** Exclude specific columns (inverse of select). */
182+
drop(...columns: string[]): DataFrame<T> {
183+
const dropSet = new Set(columns);
184+
const remaining = this._projections.length > 0
185+
? this._projections.filter(c => !dropSet.has(c))
186+
: []; // If no projections set, drop will be applied at collect time via computed exclusion
187+
// Store dropped columns as a negative projection marker
188+
return this.derive({
189+
projections: remaining,
190+
computedColumns: [
191+
...this._computedColumns,
192+
...columns.map(c => ({ alias: `__drop__${c}`, fn: () => undefined })),
193+
],
194+
});
195+
}
196+
197+
/** Rename columns. Returns a new DataFrame with renamed output columns. */
198+
rename(mapping: Record<string, string>): DataFrame {
199+
const renames = Object.entries(mapping);
200+
return this.derive({
201+
computedColumns: [
202+
...this._computedColumns,
203+
...renames.map(([from, to]) => ({
204+
alias: to,
205+
fn: (row: Row) => row[from],
206+
})),
207+
],
208+
}) as DataFrame;
209+
}
210+
171211
/** Sort results by a column. With .limit(), uses a top-K heap (O(K) memory). */
172212
sort(column: string, direction: "asc" | "desc" = "asc"): DataFrame<T> {
173213
return this.derive({ sortColumn: column, sortDirection: direction });
174214
}
175215

216+
/** Alias for .sort() — common in SQL-style APIs. */
217+
orderBy(column: string, direction: "asc" | "desc" = "asc"): DataFrame<T> {
218+
return this.sort(column, direction);
219+
}
220+
176221
/** Limit the number of returned rows. Enables early termination. */
177222
limit(n: number): DataFrame<T> {
178223
if (n < 0) throw new Error("limit() must be non-negative");
@@ -181,6 +226,7 @@ export class DataFrame<T extends Row = Row> {
181226

182227
/** Skip the first N rows. Enables offset-based pagination. */
183228
offset(n: number): DataFrame<T> {
229+
if (n < 0) throw new Error("offset() must be non-negative");
184230
return this.derive({ offset: n });
185231
}
186232

@@ -617,6 +663,8 @@ export class MaterializedExecutor implements QueryExecutor {
617663
for (const f of query.filters) {
618664
rows = rows.filter(row => {
619665
const v = row[f.column];
666+
if (f.op === "is_null") return v === null || v === undefined;
667+
if (f.op === "is_not_null") return v !== null && v !== undefined;
620668
if (v === null) return false;
621669
const fv = f.value;
622670
switch (f.op) {

src/decode.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ import { decodeLanceV2Utf8 } from "./lance-v2.js";
88
export function canSkipPage(page: PageInfo, filters: QueryDescriptor["filters"], columnName: string): boolean {
99
for (const filter of filters) {
1010
if (filter.column !== columnName) continue;
11+
if (filter.op === "is_null" || filter.op === "is_not_null") continue;
1112
if (page.minValue === undefined || page.maxValue === undefined) continue;
1213

1314
let { minValue: min, maxValue: max } = page;
@@ -370,6 +371,8 @@ export function matchesFilter(
370371
val: number | bigint | string | boolean | Float32Array | null,
371372
filter: QueryDescriptor["filters"][0],
372373
): boolean {
374+
if (filter.op === "is_null") return val === null;
375+
if (filter.op === "is_not_null") return val !== null;
373376
if (val === null) return false;
374377
const t = filter.value;
375378
switch (filter.op) {

src/errors.ts

Lines changed: 45 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,22 @@
33
*
44
* Wraps low-level errors (ENOENT, parse failures) into actionable messages.
55
*/
6+
7+
export type ErrorCode =
8+
| "TABLE_NOT_FOUND"
9+
| "INVALID_FORMAT"
10+
| "SCHEMA_MISMATCH"
11+
| "INVALID_FILTER"
12+
| "INVALID_AGGREGATE"
13+
| "MEMORY_EXCEEDED"
14+
| "NETWORK_TIMEOUT"
15+
| "QUERY_TIMEOUT"
16+
| "QUERY_FAILED";
17+
618
export class QueryModeError extends Error {
7-
readonly code: string;
19+
readonly code: ErrorCode;
820

9-
constructor(code: string, message: string, cause?: unknown) {
21+
constructor(code: ErrorCode, message: string, cause?: unknown) {
1022
super(message);
1123
this.name = "QueryModeError";
1224
this.code = code;
@@ -19,6 +31,7 @@ export class QueryModeError extends Error {
1931

2032
const raw = err instanceof Error ? err : new Error(String(err));
2133
const { table, operation } = context;
34+
const msg = raw.message;
2235

2336
// ENOENT — table/file not found
2437
if ("code" in raw && (raw as NodeJS.ErrnoException).code === "ENOENT") {
@@ -30,18 +43,46 @@ export class QueryModeError extends Error {
3043
}
3144

3245
// Parse failures
33-
if (raw.message.includes("footer") || raw.message.includes("Invalid file")) {
46+
if (msg.includes("footer") || msg.includes("Invalid file") || msg.includes("Failed to parse")) {
3447
return new QueryModeError(
3548
"INVALID_FORMAT",
3649
`Invalid table format${table ? `: ${table}` : ""}. Supported formats: .lance, .parquet, .csv, .tsv, .json, .ndjson, .jsonl, .arrow, .ipc, .feather`,
3750
raw,
3851
);
3952
}
4053

54+
// Column/schema mismatches
55+
if (msg.includes("column") && (msg.includes("not found") || msg.includes("does not exist"))) {
56+
return new QueryModeError(
57+
"SCHEMA_MISMATCH",
58+
`Column not found${table ? ` in ${table}` : ""}: ${msg}`,
59+
raw,
60+
);
61+
}
62+
63+
// Memory exceeded
64+
if (msg.includes("OOM") || msg.includes("memory") || msg.includes("budget")) {
65+
return new QueryModeError(
66+
"MEMORY_EXCEEDED",
67+
`Memory budget exceeded${table ? ` querying ${table}` : ""}. Try adding filters, reducing projections, or increasing memoryBudgetBytes.`,
68+
raw,
69+
);
70+
}
71+
72+
// Timeouts
73+
if (msg.includes("timeout") || msg.includes("timed out") || msg.includes("TIMEOUT")) {
74+
const code = msg.includes("network") || msg.includes("R2") ? "NETWORK_TIMEOUT" : "QUERY_TIMEOUT";
75+
return new QueryModeError(
76+
code,
77+
`${code === "NETWORK_TIMEOUT" ? "Network" : "Query"} timeout${table ? ` on ${table}` : ""}: ${msg}`,
78+
raw,
79+
);
80+
}
81+
4182
// Generic wrapper
4283
return new QueryModeError(
4384
"QUERY_FAILED",
44-
`${operation ?? "Query"} failed${table ? ` on ${table}` : ""}: ${raw.message}`,
85+
`${operation ?? "Query"} failed${table ? ` on ${table}` : ""}: ${msg}`,
4586
raw,
4687
);
4788
}

src/index.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ export type { FormatReader, DataSource } from "./reader.js";
1515
export { DataFrame, TableQuery, MaterializedExecutor } from "./client.js";
1616
export { LazyResultHandle } from "./client.js";
1717
export { QueryModeError } from "./errors.js";
18+
export type { ErrorCode } from "./errors.js";
1819
export { LocalExecutor } from "./local-executor.js";
1920
export { bigIntReplacer } from "./decode.js";
2021
export { createFromJSON, createFromCSV, createDemo } from "./convenience.js";

src/local.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ export class QueryMode {
6161
export { LocalExecutor } from "./local-executor.js";
6262
export { DataFrame, TableQuery, MaterializedExecutor } from "./client.js";
6363
export { QueryModeError } from "./errors.js";
64+
export type { ErrorCode } from "./errors.js";
6465
export { bigIntReplacer } from "./decode.js";
6566
export { createFromJSON, createFromCSV, createDemo } from "./convenience.js";
6667
export { formatResultSummary, formatExplain, formatBytes } from "./format.js";

0 commit comments

Comments
 (0)