|
| 1 | +# User Interview: Dogfooding QueryMode |
| 2 | + |
| 3 | +**Date:** 2026-03-06 |
| 4 | +**Method:** Hands-on usage audit + competitive analysis |
| 5 | +**Persona:** Developer evaluating QueryMode for edge analytics |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## Executive Summary |
| 10 | + |
| 11 | +QueryMode has a **unique and defensible position** as the only columnar query engine that runs natively inside Cloudflare Durable Objects with zero-copy R2 reads. The DataFrame API is well-designed and the operator pipeline is impressive. However, **onboarding friction, documentation bugs, and missing table-stakes features** would cause most evaluators to bounce before seeing the value. |
| 12 | + |
| 13 | +**Verdict:** Strong engine, weak front door. |
| 14 | + |
| 15 | +--- |
| 16 | + |
| 17 | +## Part 1: Issues Encountered (as a first-time user) |
| 18 | + |
| 19 | +### P0 - Blockers (would stop evaluation) |
| 20 | + |
| 21 | +| # | Issue | Detail | |
| 22 | +|---|-------|--------| |
| 23 | +| 1 | **`pnpm add querymode` fails** | README line 8 says to install, but package isn't published to npm. Line 242 buries this. User tries to install, fails, leaves. | |
| 24 | +| 2 | **`whereNotNull()` crashes** | Documented in `dataframe-api.mdx:14` and used in `examples/local-quickstart.ts:18`, but the method does not exist in `client.ts`. Example literally crashes. | |
| 25 | +| 3 | **`.join()` docs show wrong signature** | Docs: `orders.join(users, "user_id", "id", "inner")`. Actual: `orders.join(users, { left: "user_id", right: "id" }, "inner")`. User code won't compile. | |
| 26 | +| 4 | **Build requires Zig (undocumented)** | `pnpm build` runs `build:wasm` which needs Zig toolchain. No mention in README or docs. User clones repo, can't build. | |
| 27 | + |
| 28 | +### P1 - Significant Friction |
| 29 | + |
| 30 | +| # | Issue | Detail | |
| 31 | +|---|-------|--------| |
| 32 | +| 5 | **`.compute()` vs `.computed()` mismatch** | Docs say `compute()`, code says `computed()` (`client.ts:276`). | |
| 33 | +| 6 | **`.unionAll()` doesn't exist** | Docs say `df1.unionAll(df2)`, actual API is `df1.union(df2, true)`. | |
| 34 | +| 7 | **`head()` return type wrong in docs** | Docs say returns `QueryResult`, actually returns `Promise<T[]>`. | |
| 35 | +| 8 | **`fromCSV()` async but `fromJSON()` sync** | `QueryMode.fromCSV()` returns `Promise<DataFrame>`, `fromJSON()` returns `DataFrame`. User writes `const df = QueryMode.fromCSV(...)` then calls `.filter()` on a Promise. | |
| 36 | +| 9 | **No fast test command** | `pnpm test` takes ~8 minutes. No `test:quick` or `test:unit`. Kills iteration speed. | |
| 37 | +| 10 | **Two vitest configs, no explanation** | `vitest.config.ts` and `vitest.workers.config.ts` with no docs on which to use. | |
| 38 | + |
| 39 | +### P2 - Paper Cuts |
| 40 | + |
| 41 | +| # | Issue | Detail | |
| 42 | +|---|-------|--------| |
| 43 | +| 11 | **Only 3 error codes** | `TABLE_NOT_FOUND`, `INVALID_FORMAT`, `QUERY_FAILED`. No codes for: schema mismatch, timeout, memory exceeded, invalid filter op. | |
| 44 | +| 12 | **`query()` method is a no-op** | `QueryMode.query(fn)` literally does `return fn()`. Misleading — implies transaction semantics. | |
| 45 | +| 13 | **Type safety lost on method chains** | `select()`, `aggregate()`, `join()`, `computed()` all return unparameterized `DataFrame`, losing generic `<T>`. | |
| 46 | +| 14 | **Filter on non-existent column is silent** | `df.filter("typo_col", "eq", 5)` returns empty results with no warning. | |
| 47 | +| 15 | **`count_*` alias is ugly** | Default alias for `count("*")` is `"count_*"` — easy to mistype. | |
| 48 | +| 16 | **No `.orderBy()` alias** | Most data tools use `orderBy`. QueryMode only has `sort()`. | |
| 49 | +| 17 | **Single-column sort only** | No multi-column sort support. Common need. | |
| 50 | +| 18 | **Wrangler warnings about Fragment DO** | `wrangler.toml` binds `FragmentDO` but worker entry doesn't always export it, causing confusing warnings. | |
| 51 | + |
| 52 | +--- |
| 53 | + |
| 54 | +## Part 2: Missing Features (what users expect) |
| 55 | + |
| 56 | +| Feature | Status | Competitors Have It? | |
| 57 | +|---------|--------|---------------------| |
| 58 | +| `whereNotNull()` / `whereNull()` | Not implemented | Polars, DuckDB, Pandas | |
| 59 | +| `rename()` columns | Missing | Polars, Pandas, DuckDB | |
| 60 | +| `drop()` columns (inverse of select) | Missing | Polars, Pandas | |
| 61 | +| `sample(n)` random sampling | README says "planned" | Polars, DuckDB | |
| 62 | +| `tail(n)` | Missing | Polars, Pandas | |
| 63 | +| `fillna()` / null handling | Missing | Polars, Pandas | |
| 64 | +| `pivot()` / `unpivot()` | Missing | DuckDB, Pandas | |
| 65 | +| `toJSON()` / `toCSV()` export | Missing | Polars, DuckDB, Pandas | |
| 66 | +| Multi-column sort | Missing | All competitors | |
| 67 | +| Full-text search (BM25) | Missing | ParadeDB, ClickHouse | |
| 68 | +| Materialized views | Missing | ClickHouse, StarRocks | |
| 69 | +| SQL mode | Missing | DuckDB, ClickHouse, Athena, BigQuery | |
| 70 | +| Browser/WASM mode | Missing | DuckDB-WASM | |
| 71 | +| Delta Lake / Hudi support | Missing | Athena, StarRocks | |
| 72 | + |
| 73 | +--- |
| 74 | + |
| 75 | +## Part 3: Competitive Landscape |
| 76 | + |
| 77 | +### Direct Competitors |
| 78 | + |
| 79 | +| | QueryMode | DuckDB-WASM | LanceDB | MotherDuck | Turbopuffer | |
| 80 | +|---|---|---|---|---|---| |
| 81 | +| **Runs at edge** | Yes (CF DOs) | Yes (browser) | No | No | No | |
| 82 | +| **Serverless** | Yes | Client-only | Yes | Yes | Yes | |
| 83 | +| **Cold start** | ~50ms (DO wake) | ~200ms (WASM init) | N/A | ~500ms | ~500ms | |
| 84 | +| **Formats** | Lance+Parquet+Iceberg | Parquet+CSV | Lance | Parquet+CSV | Proprietary | |
| 85 | +| **Vector search** | HNSW built-in | No | Yes (core) | No | Yes (core) | |
| 86 | +| **SQL** | No (DataFrame API) | Full SQL | No | Full SQL | No | |
| 87 | +| **Memory-bounded** | Yes (R2 spill) | No (browser RAM) | N/A | Yes | N/A | |
| 88 | +| **Multi-region** | Yes (DO per-datacenter) | N/A | No | No | Yes | |
| 89 | +| **Pricing** | CF Workers pricing | Free | Open source | $0.00375/GiB | $1/mo per 1M vectors | |
| 90 | + |
| 91 | +### QueryMode's Unique Position (what nobody else does) |
| 92 | + |
| 93 | +1. **Edge-native columnar pipeline** — Pull-based operators running inside Durable Objects. No other system puts a full query engine at the edge with bounded memory. |
| 94 | +2. **Zero-copy R2 reads with WASM SIMD** — Range reads directly from R2 into WASM memory. No intermediate materialization. |
| 95 | +3. **Multi-format + vector in one binary** — Parquet, Lance v1/v2, and Iceberg with HNSW vector search, all in one edge-deployed WASM module. |
| 96 | +4. **Fragment DO fan-out** — Parallel scan across pooled Durable Objects per-datacenter. Scales horizontally without provisioning. |
| 97 | +5. **No engine boundary** — Business logic runs between pipeline stages. No serialization between app code and query engine. |
| 98 | + |
| 99 | +### Where QueryMode Overlaps (competitors do it better today) |
| 100 | + |
| 101 | +| Area | Better Alternative | Why | |
| 102 | +|------|-------------------|-----| |
| 103 | +| Ad-hoc SQL analytics | DuckDB / MotherDuck | Full SQL, mature optimizer, huge ecosystem | |
| 104 | +| Petabyte scans | Athena / BigQuery | Proven at scale, managed infrastructure | |
| 105 | +| Real-time OLAP dashboard | ClickHouse / StarRocks | Sub-100ms on pre-aggregated data | |
| 106 | +| Python data science | Polars / DuckDB | Rich DataFrame API with type inference | |
| 107 | +| Full-text search | ParadeDB / ClickHouse | BM25 built-in, inverted indexes | |
| 108 | + |
| 109 | +### Where QueryMode Wins (nobody else does this) |
| 110 | + |
| 111 | +| Use Case | Why QueryMode | |
| 112 | +|----------|---------------| |
| 113 | +| Edge analytics dashboard | Sub-50ms p50 from any datacenter. No cold start penalty. | |
| 114 | +| Multi-tenant SaaS metrics | One Query DO per tenant/region. Isolated, hibernating, pay-per-use. | |
| 115 | +| Vector search at the edge | HNSW in WASM + columnar filters. No separate vector DB needed. | |
| 116 | +| Lakehouse over R2 | Native Lance/Parquet/Iceberg. No ETL into a separate DB. | |
| 117 | +| Real-time writes + reads | Append via Master DO, instant invalidation to Query DOs. | |
| 118 | + |
| 119 | +--- |
| 120 | + |
| 121 | +## Part 4: Recommendations |
| 122 | + |
| 123 | +### Tier 1: Fix the Front Door (week 1) |
| 124 | + |
| 125 | +These are killing the first-5-minutes experience: |
| 126 | + |
| 127 | +1. **Publish to npm** — Even a `0.1.0-alpha` with a disclaimer. `pnpm add querymode` must work. |
| 128 | +2. **Fix all doc/code mismatches** — `whereNotNull`, `compute`/`computed`, `unionAll`, `join` signature, `head` return type. Every wrong example is a lost user. |
| 129 | +3. **Add `whereNotNull()` and `whereNull()`** — It's in the docs, users expect it, trivial to implement as sugar over filter. |
| 130 | +4. **Add a working quickstart** — Clone, install, 5-line script, see results. Test this on a fresh machine. |
| 131 | +5. **Document Zig requirement** — Or better: ship pre-built WASM so `pnpm install` is enough. |
| 132 | + |
| 133 | +### Tier 2: Close the DX Gap (weeks 2-4) |
| 134 | + |
| 135 | +These make the library feel production-ready: |
| 136 | + |
| 137 | +6. **Make `fromCSV()` sync** (or make `fromJSON()` async) — Pick one, be consistent. |
| 138 | +7. **Add multi-column sort** — `.sort([{ column: "region", direction: "asc" }, { column: "amount", direction: "desc" }])` |
| 139 | +8. **Add `rename()`, `drop()`, `toJSON()`, `toCSV()`** — Table-stakes DataFrame methods. |
| 140 | +9. **Expand error codes** — `SCHEMA_MISMATCH`, `INVALID_FILTER_OP`, `MEMORY_EXCEEDED`, `NETWORK_TIMEOUT`. |
| 141 | +10. **Add `pnpm test:quick`** — Run unit tests only, skip conformance suite. Target <10s. |
| 142 | +11. **Fix type safety** — Preserve `DataFrame<T>` generic through `select()`, `aggregate()`, etc. |
| 143 | +12. **Validate column names at query build time** — Compare against schema when available (local mode has it). |
| 144 | + |
| 145 | +### Tier 3: Strategic Differentiation (month 2+) |
| 146 | + |
| 147 | +These widen the moat: |
| 148 | + |
| 149 | +13. **SQL mode** — Even a subset (SELECT/WHERE/GROUP BY/ORDER BY). Lowers adoption barrier massively. DuckDB compatibility for migration stories. |
| 150 | +14. **Browser mode** — Ship a `querymode/browser` export that runs the WASM engine client-side against fetch() or Cache API. Compete directly with DuckDB-WASM. |
| 151 | +15. **Materialized views** — Pre-aggregate hot queries in DO storage. Instant reads for dashboard use cases. |
| 152 | +16. **Full-text search** — BM25 inverted index in WASM. Combine with vector search for hybrid retrieval. |
| 153 | +17. **Dashboard SDK** — React hooks (`useQuery`, `useStream`) that connect to QueryMode over RPC. The "Supabase for analytics" story. |
| 154 | + |
| 155 | +--- |
| 156 | + |
| 157 | +## Part 5: Positioning Recommendation |
| 158 | + |
| 159 | +### Current: "Serverless columnar query engine on Cloudflare" |
| 160 | +**Problem:** Too technical, doesn't convey the "why". |
| 161 | + |
| 162 | +### Proposed: "The analytics database that runs at the edge" |
| 163 | +**Tagline:** "Query your data lake from every Cloudflare datacenter. Sub-50ms. Zero servers." |
| 164 | + |
| 165 | +### Target personas (in order): |
| 166 | +1. **Cloudflare-native teams** already using R2 + Workers. Lowest friction adoption. |
| 167 | +2. **Multi-tenant SaaS** needing per-tenant analytics without provisioning databases. |
| 168 | +3. **AI/ML teams** needing vector search + columnar queries in one system. |
| 169 | + |
| 170 | +### Anti-personas (don't target yet): |
| 171 | +- Data scientists (need SQL + notebooks) |
| 172 | +- Petabyte-scale analytics (need Athena/BigQuery scale) |
| 173 | +- Teams committed to ClickHouse/StarRocks (need migration tooling first) |
0 commit comments