Skip to content

Commit 2c4ea03

Browse files
committed
docs: add Why QueryMode page — agent thesis and dynamic pipeline vision
Agents are uncoordinated, need live data at the edge, and can't rely on pre-built ETL. QueryMode replaces fixed pipelines with dynamic ones where the agent defines both query and business logic in the same code. Links from README and docs sidebar.
1 parent 6f2e553 commit 2c4ea03

File tree

3 files changed

+74
-0
lines changed

3 files changed

+74
-0
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,8 @@ const result = await qm
3838

3939
A pluggable columnar query library — not a query engine you push data to, but a query capability your code uses directly. No data materialization, no engine boundary, no SQL transpilation.
4040

41+
**[Why QueryMode?](https://teamchong.github.io/querymode/why-querymode/)** — Agents need dynamic pipelines, not pre-built ETL. QueryMode lets the agent define both query and business logic in the same code, at query time, with no serialization boundary between stages.
42+
4143
## Why "mode" not "engine"
4244

4345
Every query engine — Spark, DataFusion, DuckDB, Polars — has a boundary between your code and the engine:

docs/astro.config.mjs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ export default defineConfig({
1616
],
1717
sidebar: [
1818
{ label: "Overview", slug: "index" },
19+
{ label: "Why QueryMode", slug: "why-querymode" },
1920
{ label: "Getting Started", slug: "getting-started" },
2021
{ label: "DataFrame API", slug: "dataframe-api" },
2122
{ label: "Operators", slug: "operators" },
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
---
2+
title: Why QueryMode
3+
description: Agents are the new users. They need dynamic pipelines, not pre-built ETL.
4+
---
5+
6+
## The world is changing
7+
8+
Three things are happening at once:
9+
10+
1. **Agents are becoming the majority of internet traffic.** Unlike humans, agents share the same training data and reach the same conclusions independently. They can't coordinate with each other — they serve different owners, run in different parts of the world. When thousands of agents independently decide to query the same data at the same millisecond, the server must be at the edge to survive.
11+
12+
2. **Agents need live data.** Training data can't keep up with the speed the world produces information. Agents will make API calls — lots of them. And because they can't coordinate (they're not a hive mind), the same data gets requested independently by thousands of agents. That data needs to live at the edge, close to where agents run.
13+
14+
3. **Pre-built ETL can't serve agents.** Traditional data pipelines assume a human pre-defines what questions matter, builds a pipeline on a schedule, and stores the results. Agents don't ask pre-defined questions. They chain queries in ways no pipeline designer anticipated — funnel analysis, then retention for just those users, then attribution for just those retained users. The pipeline doesn't exist until the agent creates it.
15+
16+
## Fixed ETL vs dynamic pipelines
17+
18+
Most data is not well-structured enough to query directly. It needs transformation. The question is: **who defines the transformation, and when?**
19+
20+
| | Fixed ETL | QueryMode |
21+
|---|---|---|
22+
| **Who** | A human, in advance | The agent, at query time |
23+
| **When** | On a schedule | On demand |
24+
| **What** | Pre-defined transformations | Whatever the agent needs right now |
25+
| **Boundary** | Query → serialize → DB → serialize → result → parse → next query | Query and business logic run in the same code, same process |
26+
27+
QueryMode replaces fixed ETL pipelines with dynamic ones. The agent writes both the query and the business logic in the same code, with no serialization overhead between stages.
28+
29+
## The serialization boundary problem
30+
31+
Every traditional query engine has a boundary between your code and the engine:
32+
33+
```
34+
Your code → build SQL string → send to database → wait → JSON response → parse → your code
35+
```
36+
37+
If you need to ask a follow-up question based on the answer, you do it all again. Umami's attribution report does this **8 times** for a single dashboard page — each query rebuilds the same base data.
38+
39+
QueryMode has no boundary:
40+
41+
```typescript
42+
// 1 collect(), then branch freely in code
43+
const result = await qm
44+
.filter("created_at", "gte", startDate)
45+
.collect()
46+
47+
// Funnel analysis
48+
const funnelSessions = findFunnelCompletions(result.rows)
49+
50+
// Retention for JUST funnel completers — no second query
51+
const retention = computeRetention(result.rows, funnelSessions)
52+
53+
// Attribution for JUST retained users — still no second query
54+
const attribution = computeAttribution(result.rows, retention.retainedUsers)
55+
```
56+
57+
Three analyses on one result set. No SQL string construction, no JSON parsing, no round-trips. The intermediate results are live objects in memory — you inspect them, branch on them, and feed them into the next stage.
58+
59+
## Proven against real-world analytics
60+
61+
We validated this against two popular open-source analytics platforms:
62+
63+
**[Counterscale](https://github.com/benvinegar/counterscale)** (Cloudflare Analytics Engine) — All 7 query patterns ported. Analytics Engine sends SQL over HTTP with JSON serialization on every query. QueryMode handles the same workload with zero serialization.
64+
65+
**[Umami](https://github.com/umami-software/umami)** (23k+ stars, PostgreSQL/ClickHouse) — All 10 query patterns ported, including funnel analysis, cohort retention, user journeys, and attribution. Umami's attribution runs 8 separate database queries that each rebuild the same CTE. QueryMode: 1 `collect()`, branch 8 ways in code. 16ms at 10K events.
66+
67+
Both conformance tests include patterns that are impossible with the original architecture — cross-report correlation, conditional aggregation with branching, anomaly detection with thresholds, A/B test statistical analysis — all running on a single result set with no serialization between stages.
68+
69+
## The agent IS the pipeline
70+
71+
QueryMode doesn't eliminate transformation. It moves it from a pre-built schedule to query time. The agent decides what to query, how to transform it, and what to do with the result — all in the same code, all at the edge, all without waiting for a pipeline that someone built last quarter.

0 commit comments

Comments
 (0)