Skip to content

Commit 668a756

Browse files
committed
docs: fix performance page memory budget (32→256MB), add fragment-level skip, fix lazy eval
Performance page: - Memory budget default was documented as 32 MB, actual code is 256 MB - Add dedicated fragment-level skip (canSkipFragment) section - Fix benchmark table column: Miniflare → workerd - Update fragmentsSkipped explain description Lazy evaluation page: - Add DataFrame.stream() to terminal methods table (was only on LazyResultHandle) - Rewrite streaming section: stream() works directly on DataFrame, not just lazy handle - Fix after() docs to mention lt filter for descending sorts - Add guidance to prefer stream() over cursor()
1 parent 26fcbb4 commit 668a756

File tree

2 files changed

+45
-21
lines changed

2 files changed

+45
-21
lines changed

docs/src/content/docs/lazy-evaluation.mdx

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,8 @@ const result = await query.collect()
2828
| `.count()` | Return row count without materializing | Counting without data transfer |
2929
| `.exists()` | Return true if any row matches | Cheapest existence check |
3030
| `.lazy()` | Return a `LazyResultHandle` for paging | Large results, on-demand pages |
31-
| `.cursor()` | Return `AsyncIterable<Row[]>` for streaming | Process rows in batches without loading all |
31+
| `.stream()` | Yield `Row[]` batches via `AsyncGenerator` | Process rows without loading all into memory |
32+
| `.cursor()` | Return `AsyncIterable<Row[]>` for streaming | Same as stream, requires executor cursor support |
3233
| `.explain()` | Return query plan without executing | Debugging, inspect pruning |
3334

3435
## Lazy result handle
@@ -61,30 +62,40 @@ Each `.page()` call is a separate query execution with `offset` and `limit`. No
6162

6263
## Streaming iteration
6364

64-
`.stream()` on a lazy handle yields batches until exhausted:
65+
`.stream()` works directly on the DataFrame — no `.lazy()` needed:
6566

6667
```typescript
67-
const handle = await qm.table("events").lazy()
68-
69-
for await (const batch of handle.stream(500)) {
68+
for await (const batch of qm.table("events").stream(500)) {
7069
// batch is Row[] with up to 500 rows
7170
process(batch)
7271
// Break early to stop fetching
7372
if (done) break
7473
}
7574
```
7675

76+
If the executor supports cursors, `.stream()` fetches batches incrementally. Otherwise it falls back to `.collect()` and yields slices — still useful for processing without holding all rows in your code at once.
77+
78+
`.stream()` is also available on `LazyResultHandle`:
79+
80+
```typescript
81+
const handle = await qm.table("events").lazy()
82+
83+
for await (const batch of handle.stream(500)) {
84+
process(batch)
85+
}
86+
```
87+
7788
## Cursor
7889

79-
`.cursor()` is similar but works directly on the DataFrame without an intermediate handle:
90+
`.cursor()` is the low-level streaming primitive. It requires an executor with cursor support (e.g., edge mode) and throws if not available:
8091

8192
```typescript
8293
for await (const batch of qm.table("events").cursor({ batchSize: 1000 })) {
8394
await processBatch(batch)
8495
}
8596
```
8697

87-
Requires an executor with cursor support (e.g., edge mode). Throws if not available.
98+
Prefer `.stream()` unless you need to guarantee incremental fetching.
8899

89100
## Keyset pagination
90101

@@ -106,4 +117,4 @@ const page2 = await qm.table("events")
106117
.collect()
107118
```
108119

109-
`.after(value)` translates to a `gt` filter on the sort column, which benefits from page-level skip. Every page is equally fast regardless of depth.
120+
`.after(value)` translates to a `gt` filter on the sort column (or `lt` for descending sorts), which benefits from page-level skip. Every page is equally fast regardless of depth.

docs/src/content/docs/performance.mdx

Lines changed: 26 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,12 @@ description: How to get the most out of QueryMode — pruning, budgets, operator
88
Every query flows through these stages. Each is an optimization opportunity:
99

1010
```
11-
1. Partition pruning → skip entire files (O(1) catalog lookup)
12-
2. Min/max pruning → skip files by column stats
13-
3. Page-level skip → skip pages within a file by page stats
14-
4. WASM SIMD scan → decode + filter in one pass (no Row[] intermediate)
15-
5. Columnar merge → k-way merge on typed arrays (no Row[] until exit)
16-
6. Row materialization → only at final response boundary
11+
1. Partition pruning → skip entire files by partition key (O(1) catalog lookup)
12+
2. Fragment-level skip → skip files by column min/max across all pages (canSkipFragment)
13+
3. Page-level skip → skip pages within a file by per-page stats (canSkipPage)
14+
4. WASM SIMD scan → decode + filter in one pass (no Row[] intermediate)
15+
5. Columnar merge → k-way merge on typed arrays (no Row[] until exit)
16+
6. Row materialization → only at final response boundary
1717
```
1818

1919
The most expensive step is always I/O (R2 reads). Everything else optimizes around reducing I/O.
@@ -40,15 +40,14 @@ Operators that accumulate state accept a memory budget (bytes). When exceeded, t
4040

4141
| Operator | Default budget | What it accumulates |
4242
|----------|---------------|-------------------|
43-
| `ExternalSortOperator` | 32 MB | All rows until sorted |
44-
| `HashJoinOperator` | 32 MB | Build side hash table |
43+
| `ExternalSortOperator` | 256 MB | All rows until sorted |
44+
| `HashJoinOperator` | 256 MB | Build side hash table |
4545
| `AggregateOperator` | unbounded | Group states (usually small) |
4646
| `DistinctOperator` | unbounded | Seen-values hash set |
4747

4848
**Sizing guidance:**
49-
- **32 MB** works for most queries up to ~5M rows of numeric data
50-
- **128 MB** for string-heavy datasets or large joins
51-
- Cloudflare DO limit is 128 MB total — leave headroom for page buffers and WASM memory
49+
- **256 MB** (default) works for most queries — covers ~20M rows of numeric data or ~5M string rows
50+
- Reduce to **64–128 MB** on Cloudflare DOs to leave headroom for page buffers and WASM memory (DO limit is 128 MB)
5251
- Local mode (Node/Bun) has no practical limit — set budget to available RAM
5352

5453
```typescript
@@ -85,6 +84,20 @@ The DataFrame API picks automatically: `.sort().limit(k)` uses TopK when k is sm
8584
| GROUP BY with few groups (< 10K) | Hash map of accumulators — fast |
8685
| GROUP BY with many groups (> 100K) | Memory grows with cardinality — consider pre-filtering |
8786

87+
## Fragment-level skip
88+
89+
Before reading any page data, `canSkipFragment` aggregates min/max/nullCount across all pages in a fragment and checks if the entire fragment can be eliminated. This reuses the same `canSkipPage` logic but on fragment-wide stats — one check to skip potentially thousands of pages.
90+
91+
```
92+
Fragment columns: [{min: 100, max: 500}, {min: 600, max: 900}] → aggregated: min=100, max=900
93+
Filter: amount > 1000
94+
→ Skip entire fragment (no R2 reads at all)
95+
```
96+
97+
Fragment-level skip is automatic and costs nothing — it runs before any R2 I/O. For datasets with many small fragments (e.g., append-heavy workloads), this is often more effective than page-level skip because it eliminates entire R2 reads rather than individual pages within a read.
98+
99+
In explain output, `fragmentsSkipped` counts fragments eliminated by both partition pruning and fragment-level skip combined.
100+
88101
## Page-level skip
89102

90103
Each Lance page stores min/max stats per column. The scan layer checks these before reading page data:
@@ -175,7 +188,7 @@ Key fields in `ExplainResult`:
175188
|-------|------------------|
176189
| `totalRows` | Total rows in the table |
177190
| `estimatedRows` | Rows remaining after pruning |
178-
| `fragments` / `fragmentsSkipped` | How many fragments partition pruning eliminated |
191+
| `fragments` / `fragmentsSkipped` | Fragments eliminated by partition pruning + fragment-level skip |
179192
| `pagesTotal` / `pagesSkipped` | How many pages min/max pruning eliminated |
180193
| `estimatedBytes` / `estimatedR2Reads` | Actual I/O cost (bytes and coalesced R2 reads) |
181194
| `filters[].pushable` | Whether each filter is pushed to the scan layer |
@@ -190,7 +203,7 @@ Key fields in `ExplainResult`:
190203

191204
CI runs head-to-head benchmarks against DuckDB on every push. Typical results at 1M-5M rows:
192205

193-
| Operation | QueryMode (Miniflare) | DuckDB (native) |
206+
| Operation | QueryMode (workerd) | DuckDB (native) |
194207
|-----------|----------------------|-----------------|
195208
| Filter (numeric) | ~200ms | ~100ms |
196209
| GROUP BY + sum | ~300ms | ~150ms |

0 commit comments

Comments
 (0)