Skip to content

Commit be9f424

Browse files
committed
docs: add lazy evaluation page — execution model, paging, streaming, keyset pagination
Documents when code runs (terminal methods), LazyResultHandle API, streaming iteration with .stream() and .cursor(), and keyset pagination with .after() for deep pagination without offset cost.
1 parent a9000e4 commit be9f424

File tree

2 files changed

+110
-0
lines changed

2 files changed

+110
-0
lines changed

docs/astro.config.mjs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ export default defineConfig({
2727
{ label: "Formats", slug: "formats" },
2828
{ label: "Architecture", slug: "architecture" },
2929
{ label: "Columnar Format", slug: "columnar-format" },
30+
{ label: "Lazy Evaluation", slug: "lazy-evaluation" },
3031
{ label: "Performance", slug: "performance" },
3132
{ label: "Write Path", slug: "write-path" },
3233
{ label: "Deployment", slug: "deployment" },
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
---
2+
title: Lazy Evaluation
3+
description: Execution model — when code runs, eager vs lazy, streaming iteration.
4+
---
5+
6+
## Execution model
7+
8+
DataFrame methods like `.filter()`, `.sort()`, `.limit()` do not execute anything. They build a `QueryDescriptor` — a plain object describing what to do. Execution happens when you call a **terminal method**.
9+
10+
```typescript
11+
// Nothing executes here — just builds a descriptor
12+
const query = qm.table("events")
13+
.filter("status", "eq", "active")
14+
.sort("created_at", "desc")
15+
.limit(100)
16+
17+
// Execution happens HERE
18+
const result = await query.collect()
19+
```
20+
21+
## Terminal methods
22+
23+
| Method | What it does | When to use |
24+
|--------|-------------|-------------|
25+
| `.collect()` | Execute and return all matching rows | Default — most queries |
26+
| `.exec()` | Alias for `.collect()` | Same thing |
27+
| `.first()` | Return first matching row or null | Existence check or single lookup |
28+
| `.count()` | Return row count without materializing | Counting without data transfer |
29+
| `.exists()` | Return true if any row matches | Cheapest existence check |
30+
| `.lazy()` | Return a `LazyResultHandle` for paging | Large results, on-demand pages |
31+
| `.cursor()` | Return `AsyncIterable<Row[]>` for streaming | Process rows in batches without loading all |
32+
| `.explain()` | Return query plan without executing | Debugging, inspect pruning |
33+
34+
## Lazy result handle
35+
36+
`.lazy()` returns a `LazyResultHandle` that executes pages on demand:
37+
38+
```typescript
39+
const handle = await qm.table("events")
40+
.filter("status", "eq", "active")
41+
.sort("created_at", "desc")
42+
.lazy()
43+
44+
// Fetch page 0 (rows 0-99)
45+
const page0 = await handle.page(0, 100)
46+
47+
// Fetch page 3 (rows 300-399)
48+
const page3 = await handle.page(300, 100)
49+
50+
// Fetch a single row
51+
const row42 = await handle.row(42)
52+
53+
// Full materialization if needed
54+
const all = await handle.collect()
55+
```
56+
57+
Each `.page()` call is a separate query execution with `offset` and `limit`. No state is held between pages — the handle re-executes the query each time. This means:
58+
- Pages can be fetched in any order
59+
- No memory accumulates between pages
60+
- Sorted results are consistent if data doesn't change
61+
62+
## Streaming iteration
63+
64+
`.stream()` on a lazy handle yields batches until exhausted:
65+
66+
```typescript
67+
const handle = await qm.table("events").lazy()
68+
69+
for await (const batch of handle.stream(500)) {
70+
// batch is Row[] with up to 500 rows
71+
process(batch)
72+
// Break early to stop fetching
73+
if (done) break
74+
}
75+
```
76+
77+
## Cursor
78+
79+
`.cursor()` is similar but works directly on the DataFrame without an intermediate handle:
80+
81+
```typescript
82+
for await (const batch of qm.table("events").cursor({ batchSize: 1000 })) {
83+
await processBatch(batch)
84+
}
85+
```
86+
87+
Requires an executor with cursor support (e.g., edge mode). Throws if not available.
88+
89+
## Keyset pagination
90+
91+
For large sorted datasets, offset-based pagination gets slower as offset grows (the engine must skip N rows). Keyset pagination uses the last seen value to start the next page:
92+
93+
```typescript
94+
// First page
95+
const page1 = await qm.table("events")
96+
.sort("id", "asc")
97+
.limit(50)
98+
.collect()
99+
100+
// Next page — starts after the last id
101+
const lastId = page1.rows[page1.rows.length - 1].id
102+
const page2 = await qm.table("events")
103+
.sort("id", "asc")
104+
.after(lastId)
105+
.limit(50)
106+
.collect()
107+
```
108+
109+
`.after(value)` translates to a `gt` filter on the sort column, which benefits from page-level skip. Every page is equally fast regardless of depth.

0 commit comments

Comments
 (0)