Skip to content

Commit e358c26

Browse files
committed
docs: add dogfooding user interview with competitive analysis
Hands-on audit identifying 18 issues (4 P0 blockers), 14 missing features, and 10-competitor landscape analysis. Includes positioning recommendations and 3-tier improvement roadmap.
1 parent fca435c commit e358c26

File tree

1 file changed

+173
-0
lines changed

1 file changed

+173
-0
lines changed

research/user-interview-dogfood.md

Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
# User Interview: Dogfooding QueryMode
2+
3+
**Date:** 2026-03-06
4+
**Method:** Hands-on usage audit + competitive analysis
5+
**Persona:** Developer evaluating QueryMode for edge analytics
6+
7+
---
8+
9+
## Executive Summary
10+
11+
QueryMode has a **unique and defensible position** as the only columnar query engine that runs natively inside Cloudflare Durable Objects with zero-copy R2 reads. The DataFrame API is well-designed and the operator pipeline is impressive. However, **onboarding friction, documentation bugs, and missing table-stakes features** would cause most evaluators to bounce before seeing the value.
12+
13+
**Verdict:** Strong engine, weak front door.
14+
15+
---
16+
17+
## Part 1: Issues Encountered (as a first-time user)
18+
19+
### P0 - Blockers (would stop evaluation)
20+
21+
| # | Issue | Detail |
22+
|---|-------|--------|
23+
| 1 | **`pnpm add querymode` fails** | README line 8 says to install, but package isn't published to npm. Line 242 buries this. User tries to install, fails, leaves. |
24+
| 2 | **`whereNotNull()` crashes** | Documented in `dataframe-api.mdx:14` and used in `examples/local-quickstart.ts:18`, but the method does not exist in `client.ts`. Example literally crashes. |
25+
| 3 | **`.join()` docs show wrong signature** | Docs: `orders.join(users, "user_id", "id", "inner")`. Actual: `orders.join(users, { left: "user_id", right: "id" }, "inner")`. User code won't compile. |
26+
| 4 | **Build requires Zig (undocumented)** | `pnpm build` runs `build:wasm` which needs Zig toolchain. No mention in README or docs. User clones repo, can't build. |
27+
28+
### P1 - Significant Friction
29+
30+
| # | Issue | Detail |
31+
|---|-------|--------|
32+
| 5 | **`.compute()` vs `.computed()` mismatch** | Docs say `compute()`, code says `computed()` (`client.ts:276`). |
33+
| 6 | **`.unionAll()` doesn't exist** | Docs say `df1.unionAll(df2)`, actual API is `df1.union(df2, true)`. |
34+
| 7 | **`head()` return type wrong in docs** | Docs say returns `QueryResult`, actually returns `Promise<T[]>`. |
35+
| 8 | **`fromCSV()` async but `fromJSON()` sync** | `QueryMode.fromCSV()` returns `Promise<DataFrame>`, `fromJSON()` returns `DataFrame`. User writes `const df = QueryMode.fromCSV(...)` then calls `.filter()` on a Promise. |
36+
| 9 | **No fast test command** | `pnpm test` takes ~8 minutes. No `test:quick` or `test:unit`. Kills iteration speed. |
37+
| 10 | **Two vitest configs, no explanation** | `vitest.config.ts` and `vitest.workers.config.ts` with no docs on which to use. |
38+
39+
### P2 - Paper Cuts
40+
41+
| # | Issue | Detail |
42+
|---|-------|--------|
43+
| 11 | **Only 3 error codes** | `TABLE_NOT_FOUND`, `INVALID_FORMAT`, `QUERY_FAILED`. No codes for: schema mismatch, timeout, memory exceeded, invalid filter op. |
44+
| 12 | **`query()` method is a no-op** | `QueryMode.query(fn)` literally does `return fn()`. Misleading — implies transaction semantics. |
45+
| 13 | **Type safety lost on method chains** | `select()`, `aggregate()`, `join()`, `computed()` all return unparameterized `DataFrame`, losing generic `<T>`. |
46+
| 14 | **Filter on non-existent column is silent** | `df.filter("typo_col", "eq", 5)` returns empty results with no warning. |
47+
| 15 | **`count_*` alias is ugly** | Default alias for `count("*")` is `"count_*"` — easy to mistype. |
48+
| 16 | **No `.orderBy()` alias** | Most data tools use `orderBy`. QueryMode only has `sort()`. |
49+
| 17 | **Single-column sort only** | No multi-column sort support. Common need. |
50+
| 18 | **Wrangler warnings about Fragment DO** | `wrangler.toml` binds `FragmentDO` but worker entry doesn't always export it, causing confusing warnings. |
51+
52+
---
53+
54+
## Part 2: Missing Features (what users expect)
55+
56+
| Feature | Status | Competitors Have It? |
57+
|---------|--------|---------------------|
58+
| `whereNotNull()` / `whereNull()` | Not implemented | Polars, DuckDB, Pandas |
59+
| `rename()` columns | Missing | Polars, Pandas, DuckDB |
60+
| `drop()` columns (inverse of select) | Missing | Polars, Pandas |
61+
| `sample(n)` random sampling | README says "planned" | Polars, DuckDB |
62+
| `tail(n)` | Missing | Polars, Pandas |
63+
| `fillna()` / null handling | Missing | Polars, Pandas |
64+
| `pivot()` / `unpivot()` | Missing | DuckDB, Pandas |
65+
| `toJSON()` / `toCSV()` export | Missing | Polars, DuckDB, Pandas |
66+
| Multi-column sort | Missing | All competitors |
67+
| Full-text search (BM25) | Missing | ParadeDB, ClickHouse |
68+
| Materialized views | Missing | ClickHouse, StarRocks |
69+
| SQL mode | Missing | DuckDB, ClickHouse, Athena, BigQuery |
70+
| Browser/WASM mode | Missing | DuckDB-WASM |
71+
| Delta Lake / Hudi support | Missing | Athena, StarRocks |
72+
73+
---
74+
75+
## Part 3: Competitive Landscape
76+
77+
### Direct Competitors
78+
79+
| | QueryMode | DuckDB-WASM | LanceDB | MotherDuck | Turbopuffer |
80+
|---|---|---|---|---|---|
81+
| **Runs at edge** | Yes (CF DOs) | Yes (browser) | No | No | No |
82+
| **Serverless** | Yes | Client-only | Yes | Yes | Yes |
83+
| **Cold start** | ~50ms (DO wake) | ~200ms (WASM init) | N/A | ~500ms | ~500ms |
84+
| **Formats** | Lance+Parquet+Iceberg | Parquet+CSV | Lance | Parquet+CSV | Proprietary |
85+
| **Vector search** | HNSW built-in | No | Yes (core) | No | Yes (core) |
86+
| **SQL** | No (DataFrame API) | Full SQL | No | Full SQL | No |
87+
| **Memory-bounded** | Yes (R2 spill) | No (browser RAM) | N/A | Yes | N/A |
88+
| **Multi-region** | Yes (DO per-datacenter) | N/A | No | No | Yes |
89+
| **Pricing** | CF Workers pricing | Free | Open source | $0.00375/GiB | $1/mo per 1M vectors |
90+
91+
### QueryMode's Unique Position (what nobody else does)
92+
93+
1. **Edge-native columnar pipeline** — Pull-based operators running inside Durable Objects. No other system puts a full query engine at the edge with bounded memory.
94+
2. **Zero-copy R2 reads with WASM SIMD** — Range reads directly from R2 into WASM memory. No intermediate materialization.
95+
3. **Multi-format + vector in one binary** — Parquet, Lance v1/v2, and Iceberg with HNSW vector search, all in one edge-deployed WASM module.
96+
4. **Fragment DO fan-out** — Parallel scan across pooled Durable Objects per-datacenter. Scales horizontally without provisioning.
97+
5. **No engine boundary** — Business logic runs between pipeline stages. No serialization between app code and query engine.
98+
99+
### Where QueryMode Overlaps (competitors do it better today)
100+
101+
| Area | Better Alternative | Why |
102+
|------|-------------------|-----|
103+
| Ad-hoc SQL analytics | DuckDB / MotherDuck | Full SQL, mature optimizer, huge ecosystem |
104+
| Petabyte scans | Athena / BigQuery | Proven at scale, managed infrastructure |
105+
| Real-time OLAP dashboard | ClickHouse / StarRocks | Sub-100ms on pre-aggregated data |
106+
| Python data science | Polars / DuckDB | Rich DataFrame API with type inference |
107+
| Full-text search | ParadeDB / ClickHouse | BM25 built-in, inverted indexes |
108+
109+
### Where QueryMode Wins (nobody else does this)
110+
111+
| Use Case | Why QueryMode |
112+
|----------|---------------|
113+
| Edge analytics dashboard | Sub-50ms p50 from any datacenter. No cold start penalty. |
114+
| Multi-tenant SaaS metrics | One Query DO per tenant/region. Isolated, hibernating, pay-per-use. |
115+
| Vector search at the edge | HNSW in WASM + columnar filters. No separate vector DB needed. |
116+
| Lakehouse over R2 | Native Lance/Parquet/Iceberg. No ETL into a separate DB. |
117+
| Real-time writes + reads | Append via Master DO, instant invalidation to Query DOs. |
118+
119+
---
120+
121+
## Part 4: Recommendations
122+
123+
### Tier 1: Fix the Front Door (week 1)
124+
125+
These are killing the first-5-minutes experience:
126+
127+
1. **Publish to npm** — Even a `0.1.0-alpha` with a disclaimer. `pnpm add querymode` must work.
128+
2. **Fix all doc/code mismatches**`whereNotNull`, `compute`/`computed`, `unionAll`, `join` signature, `head` return type. Every wrong example is a lost user.
129+
3. **Add `whereNotNull()` and `whereNull()`** — It's in the docs, users expect it, trivial to implement as sugar over filter.
130+
4. **Add a working quickstart** — Clone, install, 5-line script, see results. Test this on a fresh machine.
131+
5. **Document Zig requirement** — Or better: ship pre-built WASM so `pnpm install` is enough.
132+
133+
### Tier 2: Close the DX Gap (weeks 2-4)
134+
135+
These make the library feel production-ready:
136+
137+
6. **Make `fromCSV()` sync** (or make `fromJSON()` async) — Pick one, be consistent.
138+
7. **Add multi-column sort**`.sort([{ column: "region", direction: "asc" }, { column: "amount", direction: "desc" }])`
139+
8. **Add `rename()`, `drop()`, `toJSON()`, `toCSV()`** — Table-stakes DataFrame methods.
140+
9. **Expand error codes**`SCHEMA_MISMATCH`, `INVALID_FILTER_OP`, `MEMORY_EXCEEDED`, `NETWORK_TIMEOUT`.
141+
10. **Add `pnpm test:quick`** — Run unit tests only, skip conformance suite. Target <10s.
142+
11. **Fix type safety** — Preserve `DataFrame<T>` generic through `select()`, `aggregate()`, etc.
143+
12. **Validate column names at query build time** — Compare against schema when available (local mode has it).
144+
145+
### Tier 3: Strategic Differentiation (month 2+)
146+
147+
These widen the moat:
148+
149+
13. **SQL mode** — Even a subset (SELECT/WHERE/GROUP BY/ORDER BY). Lowers adoption barrier massively. DuckDB compatibility for migration stories.
150+
14. **Browser mode** — Ship a `querymode/browser` export that runs the WASM engine client-side against fetch() or Cache API. Compete directly with DuckDB-WASM.
151+
15. **Materialized views** — Pre-aggregate hot queries in DO storage. Instant reads for dashboard use cases.
152+
16. **Full-text search** — BM25 inverted index in WASM. Combine with vector search for hybrid retrieval.
153+
17. **Dashboard SDK** — React hooks (`useQuery`, `useStream`) that connect to QueryMode over RPC. The "Supabase for analytics" story.
154+
155+
---
156+
157+
## Part 5: Positioning Recommendation
158+
159+
### Current: "Serverless columnar query engine on Cloudflare"
160+
**Problem:** Too technical, doesn't convey the "why".
161+
162+
### Proposed: "The analytics database that runs at the edge"
163+
**Tagline:** "Query your data lake from every Cloudflare datacenter. Sub-50ms. Zero servers."
164+
165+
### Target personas (in order):
166+
1. **Cloudflare-native teams** already using R2 + Workers. Lowest friction adoption.
167+
2. **Multi-tenant SaaS** needing per-tenant analytics without provisioning databases.
168+
3. **AI/ML teams** needing vector search + columnar queries in one system.
169+
170+
### Anti-personas (don't target yet):
171+
- Data scientists (need SQL + notebooks)
172+
- Petabyte-scale analytics (need Athena/BigQuery scale)
173+
- Teams committed to ClickHouse/StarRocks (need migration tooling first)

0 commit comments

Comments
 (0)