Skip to content

Commit a728037

Browse files
committed
docs: rewrite "Why QueryMode" — agents query like humans, just faster
The old framing positioned agents as mysterious users asking unpredictable questions that pre-built ETL can't handle. The real insight is simpler: agents ask the same questions humans ask, at machine pace. The infrastructure challenge is concurrency at the origin and serialization overhead per request, not query unpredictability. Key changes: - Lead with "agents serve humans" instead of "agents are alien traffic" - Reframe the problem as thundering herd + serialization tax, not ETL rigidity - New comparison table: Traditional vs QueryMode (concurrency, overhead, follow-ups) - "Edge-native: survive the thundering herd" section - "The agent IS the user" closing instead of "the agent IS the pipeline" - Composability section: why this matters even if CF ships native OLAP Other docs reviewed — index, architecture, composability, operators, sql, dataframe-api, getting-started all use neutral technical framing. No changes needed.
1 parent aef5397 commit a728037

1 file changed

Lines changed: 28 additions & 32 deletions

File tree

Lines changed: 28 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,36 @@
11
---
22
title: Why QueryMode
3-
description: Agents are the new users. They need dynamic pipelines, not pre-built ETL.
3+
description: Agents query like humans, but at machine pace. Data infrastructure needs to keep up.
44
---
55

6-
1. **Agents are becoming the majority of internet traffic.** They serve different owners across different parts of the world, but share the same training data and independently reach the same conclusions. When thousands of agents hit the same endpoints at the same millisecond, the result is thundering herds that look like a DDoS — except every request is legitimate. That's not a DDoS attack. That's just Tuesday. Data must live at the edge to survive this.
6+
1. **Agents serve humans.** An agent asking "what's my retention this week?" is the same question a PM would ask in a dashboard. The query is identical. The difference is pace — one human asks once, a thousand agents ask at the same millisecond on behalf of a thousand humans.
77

8-
2. **Agents need live data.** Decisions based on outdated training data lead to bad outcomes. Training data can't keep up with the speed the world produces information. Agents make API calls for live data — lots of them. That data needs to live at the edge, close to where agents run.
8+
2. **That pace breaks traditional infrastructure.** A thousand identical queries hitting the same origin database at once looks like a DDoS, except every request is legitimate. Connection pools saturate. Cold caches stampede. The database that handles one dashboard user fine collapses under a thousand agents doing the same thing concurrently.
99

10-
3. **Pre-built ETL can't serve agents.** Traditional data pipelines assume a human pre-defines what questions matter, builds a pipeline on a schedule, and stores the results. Agents don't ask pre-defined questions. They chain queries in ways no pipeline designer anticipated — funnel analysis, then retention for just those users, then attribution for just those retained users. The pipeline doesn't exist until the agent creates it.
10+
3. **Every request pays a serialization tax.** Each agent builds a SQL string, sends it over the network, waits for JSON back, parses it, then builds the next query. An agent chaining three analyses pays that tax six times — three round-trips, three serialize/deserialize cycles. The queries are simple. The overhead is not.
1111

12-
## Fixed ETL vs dynamic pipelines
12+
## The real problem isn't what agents ask — it's how many ask at once
1313

14-
Most data is not well-structured enough to query directly. It needs transformation. The question is: **who defines the transformation, and when?**
14+
Agents don't ask mysterious questions. They ask the same questions humans ask: funnel analysis, retention, attribution, top-N queries. The infrastructure challenge isn't unpredictability — it's concurrency at the origin and serialization overhead per request.
1515

16-
| | Fixed ETL | QueryMode |
16+
| | Traditional | QueryMode |
1717
|---|---|---|
18-
| **Who** | A human, in advance | The agent, at query time |
19-
| **When** | On a schedule | On demand |
20-
| **What** | Pre-defined transformations | Whatever the agent needs right now |
21-
| **Boundary** | Query → serialize → DB → serialize → result → parse → next query | Query and business logic run in the same code, same process |
18+
| **Where data lives** | Origin database, single region | R2 at the edge, free egress |
19+
| **Concurrency model** | Connection pool, shared origin | Isolated Durable Objects per region |
20+
| **Query overhead** | SQL string → network → JSON → parse per query | Same-process function call, zero serialization |
21+
| **Follow-up queries** | Full round-trip each time | Branch over the same result set in memory |
2222

23-
QueryMode replaces fixed ETL pipelines with dynamic ones. The agent writes both the query and the business logic in the same code, with no serialization overhead between stages.
23+
## No serialization boundary
2424

25-
## The serialization boundary problem
26-
27-
Every traditional query engine has a boundary between your code and the engine:
25+
Every traditional query engine has a wall between your code and the engine:
2826

2927
```
3028
Your code → build SQL string → send to database → wait → JSON response → parse → your code
3129
```
3230

33-
If you need to ask a follow-up question based on the answer, you do it all again. Umami's attribution report does this **8 times** for a single dashboard page — each query rebuilds the same base data.
31+
If you need a follow-up question, you do it all again. Umami's attribution report does this **8 times** for a single dashboard page — each query rebuilds the same base data.
3432

35-
QueryMode has no boundary:
33+
QueryMode has no wall:
3634

3735
```typescript
3836
// 1 collect(), then branch freely in code
@@ -54,15 +52,13 @@ Three analyses on one result set. No SQL string construction, no JSON parsing, n
5452

5553
> **What about memory?** `collect()` doesn't load a 50GB file into a V8 isolate. Filter pushdown already skipped irrelevant pages via min/max stats, aggregation already reduced rows to group summaries, projection already dropped unused columns. What lands in memory is the *result*, not the dataset. Operators are memory-bounded (default 32MB) and [spill to R2](/querymode/operators/#memory-bounded-with-r2-spill) when they exceed budget.
5654
57-
## How it actually works under the hood
58-
59-
### Where the data lives
55+
## Edge-native: survive the thundering herd
6056

61-
Data sits in **R2** as columnar files (Parquet, Lance, Iceberg). Nothing gets replicated to 300 edge nodes. Regional Query DOs cache table footers (~4KB each) and read data pages from R2 via coalesced HTTP range requests (~10ms).
57+
Data sits in **R2** as columnar files (Parquet, Lance, Iceberg). Regional Query DOs cache table footers (~4KB each) and read data pages from R2 via coalesced HTTP range requests (~10ms). Free egress means a thousand concurrent reads don't cost a thousand times more.
6258

63-
"Data at the edge" means metadata cached locally, pages fetched on demand with free egress. Not replicated databases.
59+
A thousand agents asking the same question from the same region hit the same Query DO, which serves cached footers and coordinates parallel fragment scans. No origin database. No connection pool. No stampede.
6460

65-
### The operators ARE the optimizer
61+
## The operators ARE the optimizer
6662

6763
Every query runs through a pull-based [operator pipeline](/querymode/operators/):
6864

@@ -82,11 +78,15 @@ These do real query optimization work:
8278

8379
The difference: you can see every stage, swap implementations, inject custom logic between operators, and control the memory budget. The plan isn't a black box.
8480

85-
### Governance
81+
## Composable — even if Cloudflare ships native OLAP
82+
83+
Cloudflare may eventually ship a native analytics engine. That would be a fixed engine with a fixed query language. QueryMode is composable: operators are building blocks your code assembles. You can put an [ML scoring function between pipeline stages](/querymode/operators/#compose-operators-directly), swap the sort algorithm, or inject a rate limiter between scan and filter. A black-box engine can't do that — no matter how fast it is.
8684

87-
"The pipeline doesn't exist until the agent creates it" sounds terrifying if you're a CISO. But the pipeline is just operator composition — it doesn't grant access to anything. `MasterDO` owns table metadata. The agent can only query tables that are registered, and only columns that are exposed. Row-level and column-level access control happens before the pipeline runs, not inside it.
85+
## Governance
8886

89-
The transformation is dynamic. The authorization is not.
87+
"Agents query at machine pace" sounds terrifying if you're a CISO. But the pipeline is just operator composition — it doesn't grant access to anything. `MasterDO` owns table metadata. The agent can only query tables that are registered, and only columns that are exposed. Row-level and column-level access control happens before the pipeline runs, not inside it.
88+
89+
The pace is dynamic. The authorization is not.
9090

9191
## What we've tested so far
9292

@@ -96,10 +96,6 @@ We ported query patterns from two open-source analytics platforms:
9696

9797
**[Umami](https://github.com/umami-software/umami)** (23k+ stars, PostgreSQL/ClickHouse) — 10 query patterns, including funnel analysis, cohort retention, user journeys, and attribution. Umami's attribution sends 8 separate database queries that each rebuild the same base CTE. On QueryMode, the same analysis runs as 1 `collect()` with 8 code branches over the same result set.
9898

99-
Both test suites also include multi-step analyses that would be awkward with the original architecture — things like running funnel analysis and then feeding the resulting session IDs directly into a retention computation, without a second round-trip. These aren't impossible in SQL, but they'd require rewriting queries and additional database calls. With QueryMode, intermediate results are just objects in memory.
100-
101-
## The agent IS the pipeline
102-
103-
QueryMode doesn't eliminate transformation. It moves it from a pre-built schedule to query time. The agent decides what to query, how to transform it, and what to do with the result — all in the same code, same process. If the data is well-structured, the agent queries it directly. If it's not, the agent builds the transformation on the spot. Either way, no one had to anticipate the question in advance.
99+
## The agent IS the user
104100

105-
It doesn't eliminate the query optimizer either. The operators do filter pushdown, vectorized decode, memory-bounded spill — but you assemble them, you control the budget, and you can put an [ML scoring function between pipeline stages](/querymode/operators/#compose-operators-directly) if you want to.
101+
QueryMode doesn't change what gets queried. It changes where and how fast. Data lives at the edge instead of a central origin. Queries execute in-process instead of over the network. Follow-ups branch over results in memory instead of round-tripping. The agent asks the same questions a human would — it just asks them at machine pace, and the infrastructure keeps up.

0 commit comments

Comments
 (0)