docs: fix performance page memory budget (32→256MB), add fragment-level skip, fix lazy eval

teamchong · teamchong · commit 668a756ebd9f · 2026-03-18T00:58:10.000-04:00
Performance page:
- Memory budget default was documented as 32 MB, actual code is 256 MB
- Add dedicated fragment-level skip (canSkipFragment) section
- Fix benchmark table column: Miniflare → workerd
- Update fragmentsSkipped explain description

Lazy evaluation page:
- Add DataFrame.stream() to terminal methods table (was only on LazyResultHandle)
- Rewrite streaming section: stream() works directly on DataFrame, not just lazy handle
- Fix after() docs to mention lt filter for descending sorts
- Add guidance to prefer stream() over cursor()
diff --git a/docs/src/content/docs/lazy-evaluation.mdx b/docs/src/content/docs/lazy-evaluation.mdx
@@ -28,7 +28,8 @@ const result = await query.collect()
 | `.count()` | Return row count without materializing | Counting without data transfer |
 | `.exists()` | Return true if any row matches | Cheapest existence check |
 | `.lazy()` | Return a `LazyResultHandle` for paging | Large results, on-demand pages |
-| `.cursor()` | Return `AsyncIterable<Row[]>` for streaming | Process rows in batches without loading all |
+| `.stream()` | Yield `Row[]` batches via `AsyncGenerator` | Process rows without loading all into memory |
+| `.cursor()` | Return `AsyncIterable<Row[]>` for streaming | Same as stream, requires executor cursor support |
 | `.explain()` | Return query plan without executing | Debugging, inspect pruning |
 
 ## Lazy result handle
@@ -61,30 +62,40 @@ Each `.page()` call is a separate query execution with `offset` and `limit`. No
 
 ## Streaming iteration
 
-`.stream()` on a lazy handle yields batches until exhausted:
+`.stream()` works directly on the DataFrame — no `.lazy()` needed:
 
 ```typescript
-const handle = await qm.table("events").lazy()
-
-for await (const batch of handle.stream(500)) {
+for await (const batch of qm.table("events").stream(500)) {
   // batch is Row[] with up to 500 rows
   process(batch)
   // Break early to stop fetching
   if (done) break
 }
 ```
 
+If the executor supports cursors, `.stream()` fetches batches incrementally. Otherwise it falls back to `.collect()` and yields slices — still useful for processing without holding all rows in your code at once.
+
+`.stream()` is also available on `LazyResultHandle`:
+
+```typescript
+const handle = await qm.table("events").lazy()
+
+for await (const batch of handle.stream(500)) {
+  process(batch)
+}
+```
+
 ## Cursor
 
-`.cursor()` is similar but works directly on the DataFrame without an intermediate handle:
+`.cursor()` is the low-level streaming primitive. It requires an executor with cursor support (e.g., edge mode) and throws if not available:
 
 ```typescript
 for await (const batch of qm.table("events").cursor({ batchSize: 1000 })) {
   await processBatch(batch)
 }
 ```
 
-Requires an executor with cursor support (e.g., edge mode). Throws if not available.
+Prefer `.stream()` unless you need to guarantee incremental fetching.
 
 ## Keyset pagination
 
@@ -106,4 +117,4 @@ const page2 = await qm.table("events")
   .collect()
 ```
 
-`.after(value)` translates to a `gt` filter on the sort column, which benefits from page-level skip. Every page is equally fast regardless of depth.
+`.after(value)` translates to a `gt` filter on the sort column (or `lt` for descending sorts), which benefits from page-level skip. Every page is equally fast regardless of depth.
diff --git a/docs/src/content/docs/performance.mdx b/docs/src/content/docs/performance.mdx
@@ -8,12 +8,12 @@ description: How to get the most out of QueryMode — pruning, budgets, operator
 Every query flows through these stages. Each is an optimization opportunity:
 
 ```
-1. Partition pruning  → skip entire files (O(1) catalog lookup)
-2. Min/max pruning    → skip files by column stats
-3. Page-level skip    → skip pages within a file by page stats
-4. WASM SIMD scan     → decode + filter in one pass (no Row[] intermediate)
-5. Columnar merge     → k-way merge on typed arrays (no Row[] until exit)
-6. Row materialization → only at final response boundary
+1. Partition pruning    → skip entire files by partition key (O(1) catalog lookup)
+2. Fragment-level skip  → skip files by column min/max across all pages (canSkipFragment)
+3. Page-level skip      → skip pages within a file by per-page stats (canSkipPage)
+4. WASM SIMD scan       → decode + filter in one pass (no Row[] intermediate)
+5. Columnar merge       → k-way merge on typed arrays (no Row[] until exit)
+6. Row materialization  → only at final response boundary
 ```
 
 The most expensive step is always I/O (R2 reads). Everything else optimizes around reducing I/O.
@@ -40,15 +40,14 @@ Operators that accumulate state accept a memory budget (bytes). When exceeded, t
 
 | Operator | Default budget | What it accumulates |
 |----------|---------------|-------------------|
-| `ExternalSortOperator` | 32 MB | All rows until sorted |
-| `HashJoinOperator` | 32 MB | Build side hash table |
+| `ExternalSortOperator` | 256 MB | All rows until sorted |
+| `HashJoinOperator` | 256 MB | Build side hash table |
 | `AggregateOperator` | unbounded | Group states (usually small) |
 | `DistinctOperator` | unbounded | Seen-values hash set |
 
 **Sizing guidance:**
-- **32 MB** works for most queries up to ~5M rows of numeric data
-- **128 MB** for string-heavy datasets or large joins
-- Cloudflare DO limit is 128 MB total — leave headroom for page buffers and WASM memory
+- **256 MB** (default) works for most queries — covers ~20M rows of numeric data or ~5M string rows
+- Reduce to **64–128 MB** on Cloudflare DOs to leave headroom for page buffers and WASM memory (DO limit is 128 MB)
 - Local mode (Node/Bun) has no practical limit — set budget to available RAM
 
 ```typescript
@@ -85,6 +84,20 @@ The DataFrame API picks automatically: `.sort().limit(k)` uses TopK when k is sm
 | GROUP BY with few groups (< 10K) | Hash map of accumulators — fast |
 | GROUP BY with many groups (> 100K) | Memory grows with cardinality — consider pre-filtering |
 
+## Fragment-level skip
+
+Before reading any page data, `canSkipFragment` aggregates min/max/nullCount across all pages in a fragment and checks if the entire fragment can be eliminated. This reuses the same `canSkipPage` logic but on fragment-wide stats — one check to skip potentially thousands of pages.
+
+```
+Fragment columns: [{min: 100, max: 500}, {min: 600, max: 900}]  →  aggregated: min=100, max=900
+Filter: amount > 1000
+→ Skip entire fragment (no R2 reads at all)
+```
+
+Fragment-level skip is automatic and costs nothing — it runs before any R2 I/O. For datasets with many small fragments (e.g., append-heavy workloads), this is often more effective than page-level skip because it eliminates entire R2 reads rather than individual pages within a read.
+
+In explain output, `fragmentsSkipped` counts fragments eliminated by both partition pruning and fragment-level skip combined.
+
 ## Page-level skip
 
 Each Lance page stores min/max stats per column. The scan layer checks these before reading page data:
@@ -175,7 +188,7 @@ Key fields in `ExplainResult`:
 |-------|------------------|
 | `totalRows` | Total rows in the table |
 | `estimatedRows` | Rows remaining after pruning |
-| `fragments` / `fragmentsSkipped` | How many fragments partition pruning eliminated |
+| `fragments` / `fragmentsSkipped` | Fragments eliminated by partition pruning + fragment-level skip |
 | `pagesTotal` / `pagesSkipped` | How many pages min/max pruning eliminated |
 | `estimatedBytes` / `estimatedR2Reads` | Actual I/O cost (bytes and coalesced R2 reads) |
 | `filters[].pushable` | Whether each filter is pushed to the scan layer |
@@ -190,7 +203,7 @@ Key fields in `ExplainResult`:
 
 CI runs head-to-head benchmarks against DuckDB on every push. Typical results at 1M-5M rows:
 
-| Operation | QueryMode (Miniflare) | DuckDB (native) |
+| Operation | QueryMode (workerd) | DuckDB (native) |
 |-----------|----------------------|-----------------|
 | Filter (numeric) | ~200ms | ~100ms |
 | GROUP BY + sum | ~300ms | ~150ms |