Skip to content

Commit e4e1af7

Browse files
committed
docs: reframe operators as query primitives, not SQL mappings
Remove SQL-centric framing from operator table — operations are code-first primitives, not SQL translations. Add HAVING as composition example, mark vector NEAR and sample as planned, add SQL→AST→operators as future frontend layer.
1 parent 21adcba commit e4e1af7

File tree

1 file changed

+49
-18
lines changed

1 file changed

+49
-18
lines changed

README.md

Lines changed: 49 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -50,24 +50,49 @@ Your app code IS the query execution. The WASM engine is a library function your
5050

5151
## Query engine as code
5252

53-
Every SQL clause is a composable code primitive. They all implement the same pull-based `Operator` interface — `next() → RowBatch | null` — so you chain them however you want, not how a SQL planner decides.
53+
Every query operation is a composable code primitive. They all implement the same pull-based `Operator` interface — `next() → RowBatch | null` — so you chain them however you want.
5454

5555
```
56-
SQL clause Operator class What it does
56+
Operation Operator class What it does
5757
───────── ────────────── ────────────
58-
WHERE FilterOperator Predicate pushdown on rows
59-
SELECT ProjectOperator Column projection
60-
ORDER BY ExternalSortOperator Disk-spilling merge sort
61-
InMemorySortOperator In-memory sort (small datasets)
62-
GROUP BY + agg AggregateOperator Hash aggregate (sum/avg/min/max/count/stddev/median/percentile)
63-
LIMIT / OFFSET LimitOperator Row limiting with offset
64-
TopKOperator Heap-based top-K (no full sort)
65-
JOIN HashJoinOperator Grace hash join with R2 spill
66-
PARTITION BY WindowOperator row_number, rank, dense_rank, lag, lead, rolling aggregates
67-
DISTINCT DistinctOperator Hash-based deduplication
68-
UNION/INTERSECT SetOperator Set operations (union, union_all, intersect, except)
69-
computed column ComputedColumnOperator Arbitrary (row: Row) => value transforms
70-
IN (subquery) SubqueryInOperator Semi-join filter against a value set
58+
59+
Filtering
60+
predicate FilterOperator eq, neq, gt, gte, lt, lte, in
61+
membership SubqueryInOperator Semi-join filter against a value set
62+
63+
Projection
64+
select ProjectOperator Column selection
65+
transform ComputedColumnOperator Arbitrary (row: Row) => value per row
66+
67+
Aggregation
68+
group + reduce AggregateOperator sum, avg, min, max, count, count_distinct,
69+
stddev, variance, median, percentile
70+
having FilterOperator Filter after AggregateOperator — same primitive, you control order
71+
72+
Sorting
73+
full sort ExternalSortOperator Disk-spilling merge sort with R2 spill
74+
in-memory sort InMemorySortOperator In-memory sort (small datasets)
75+
top-K TopKOperator Heap-based top-K without full sort
76+
77+
Joining
78+
hash join HashJoinOperator inner, left, right, full, cross — Grace hash join with R2 spill
79+
80+
Windowing
81+
partition WindowOperator row_number, rank, dense_rank, lag, lead,
82+
rolling sum/avg/min/max/count
83+
84+
Deduplication
85+
distinct DistinctOperator Hash-based deduplication on column set
86+
87+
Set operations
88+
combine SetOperator union, union_all, intersect, except
89+
90+
Limiting
91+
limit/offset LimitOperator Row limiting with offset
92+
sample (planned) Random sampling
93+
94+
Similarity
95+
vector near (planned) NEAR topK as composable operator — currently in scan layer
7196
```
7297

7398
### Compose operators directly
@@ -85,14 +110,16 @@ const source: Operator = {
85110
async close() {},
86111
}
87112

88-
// Chain operators like function calls — no query planner, no SQL string
113+
// Chain operators — no query planner, no SQL string
89114
const filtered = new FilterOperator(source, [{ column: "age", op: "gt", value: 25 }])
90115
const aggregated = new AggregateOperator(filtered, {
91116
table: "users", filters: [], projections: [],
92117
groupBy: ["region"],
93118
aggregates: [{ fn: "sum", column: "amount", alias: "total" }],
94119
})
95-
const top10 = new TopKOperator(aggregated, "total", true, 10)
120+
// "HAVING" is just a filter after aggregation — same operator, you control order
121+
const having = new FilterOperator(aggregated, [{ column: "total", op: "gt", value: 1000 }])
122+
const top10 = new TopKOperator(having, "total", true, 10)
96123

97124
// Pull results — zero-copy, no serialization between stages
98125
const rows = await drainPipeline(top10)
@@ -130,9 +157,13 @@ const rows = await drainPipeline(sorted)
130157
await spill.cleanup()
131158
```
132159

160+
### Planned: SQL → AST → operators
161+
162+
For users who prefer SQL syntax, a future layer can parse SQL into an AST and compile it to operator composition — same zero-copy pipeline underneath, SQL is just another frontend.
163+
133164
### Why this matters
134165

135-
Traditional engines give you SQL or a DataFrame API. You can't put a window function before a join, run custom logic between pipeline stages, or swap the sort implementation. The planner decides.
166+
Traditional engines give you a fixed query language. You can't put a window function before a join, run custom logic between pipeline stages, or swap the sort implementation. The planner decides.
136167

137168
With QueryMode, operators are building blocks. Your code assembles the pipeline, controls the memory budget, decides when to spill. The query engine isn't a service you call — it's a library your code composes.
138169

0 commit comments

Comments
 (0)