feat: intent + structured query for deep_search by igrigorik · Pull Request #180 · tobi/qmd

igrigorik · 2026-02-16T07:21:56Z

Intent — steer every pipeline stage

Optional intent parameter across all search tools. When a query is ambiguous
(e.g., "performance" → web-perf vs health), intent provides background context
that steers results toward the caller's interpretation.

{ query: "performance", intent: "web performance and latency" }

Intent flows through five stages in deep_search:

Query expansion: LLM prompt becomes "Context: {intent}\nExpand: {query}"
Strong-signal bypass: disabled — obvious BM25 match isn't what the caller wants
Chunk selection: intent terms (weight 0.5) augment query terms (1.0)
Reranking: intent prepended to query for Qwen3-Reranker
Snippet extraction: intent terms (weight 0.3) nudge toward relevant lines

Structured query — caller-provided expansions bypass LLM

deep_search has a local 1.7B LLM that expands queries into keywords, concepts,
and a hypothetical passage. An upstream LLM caller (Claude, GPT) already knows
what the user wants — routing through 1.7B is a capability downgrade.

Structured query lets the caller provide expansions directly, skipping LLM:

  string (auto — LLM expands):        structured (manual — no LLM):

  "performance"                        { text, keywords, concepts, passage }
        │                                 │       │         │         │
        ▼                                 ▼       ▼         ▼         ▼
  ┌────────────┐                        BM25     FTS     vector    vector
  │ Qwen3 1.7B │ → {lex,vec,hyde}        └───────┴─────────┴─────────┘
  └────────────┘                                      │
        │                                             ▼
        ▼                                       RRF → rerank
  FTS + vector → RRF → rerank
                                               ~240ms (89% faster)
  ~2s (LLM bottleneck)

Fields route directly: keywords → FTS, concepts → vector, passage → vector.
Intent remains orthogonal — steers scoring in both modes.

Eval (2115-doc corpus, 8 ambiguous queries)

Each query has multiple valid interpretations (e.g. "performance" → web-perf vs
health, "scaling" → infra vs business).

Metric	Baseline (LLM)	Intent (steered)	Structured (no LLM)
Signal Density @5	0.500	0.675 (+0.175)	0.725 (+0.225)
MRR	0.738	0.938 (+0.200)	1.000 (+0.262)
Avg latency	2114ms	1781ms	240ms (89% faster)

MRR of 1.000 = first result always relevant. Quality wins from the upstream
caller being better at disambiguation than the local 1.7B model.

Implementation highlights

StructuredQuery type: { text, keywords?, concepts?, passage? } with normalizeQuery() stripping empty fields and hasCallerExpansions() detecting caller-driven mode
Unified routing: FTS and vector paths shared between caller/LLM modes — ~95 → ~65 lines
RRF weight fix: replaced positional heuristic (i < 2) with primaryIndices: Set<number> tracking which ranked lists represent the original query
MCP schema: z.union([string, object]) → anyOf in JSON Schema. Empirically tested: Opus/Sonnet handle correctly, Haiku fails safe via additionalProperties: false
extractSnippet refactor: 6 positional params → options object, eliminates undefined placeholders at ~31 callsites
Intent tokenization: stop word set + punctuation stripping, short domain terms (API, SQL) survive
Collection filtering: MCP instructions now direct callers to filter by collection when intent maps

@5

Optional `intent` parameter across all search tools (MCP, CLI, programmatic). When a query is ambiguous (e.g., "performance" could mean web-perf, team, or health), intent provides background context that steers results toward the caller's intended interpretation. Omit for precise queries. CLI: qmd query "performance" --intent "web performance, latency, page load times" qmd vsearch "scaling" --intent "infrastructure scaling and distributed systems" qmd search "trust" --intent "team trust and psychological safety" MCP: { "tool": "deep_search", "query": "performance", "intent": "web performance, latency, page load times" } ### How it works Intent flows through five pipeline stages in deep_search (progressively fewer in lighter tools): 1. Query expansion: LLM prompt becomes "Context: {intent}\nExpand: {query}" 2. Strong-signal bypass: disabled when intent is present — if the caller provides intent, the obvious BM25 interpretation is the wrong one 3. Chunk selection: intent terms (weight 0.5) augment query terms (1.0) 4. Reranking: intent prepended to query for Qwen3-Reranker 5. Snippet extraction: intent terms (weight 0.3) nudge toward relevant lines Intent terms are tokenized via extractIntentTerms(): lowercased, surrounding punctuation stripped, >1 char, stop words removed. Short domain terms (API, SQL, LLM, CDN) survive. ### Eval results (2,115 doc corpus, 6 ambiguous queries × 3 conditions) - Signal Density @5: +0.067 (intent A), +0.100 (intent B) over baseline - MRR: +0.208 (intent B), driven by "performance" and "trust" jumps - Jaccard(A, B) = 0.169 — competing intents produce 83% different top-5 - Best case: "trust" + team intent → MRR 0.33 → 1.00 ## Refactored - extractSnippet: positional params → options object (ExtractSnippetOptions). Eliminates `undefined, undefined, intent` placeholder pattern at ~31 callsites across src/ and test/ - Named constants: INTENT_WEIGHT_CHUNK (0.5), INTENT_WEIGHT_SNIPPET (0.3) replace inlined magic numbers - extractIntentTerms(): shared tokenizer with stop word set (seeded from finetune/reward.py, extended with 2-3 char function words). Used in both snippet extraction and chunk selection - Unified MCP intent description across all 3 tools ## Fixes - Intent tokenization: strip surrounding punctuation (trailing commas, parens) while preserving internal hyphens (self-hosted, real-time) - Intent tokenization: lower length threshold from >3 to >1 so short domain terms (API, SQL, LLM) survive — stop word set expanded to compensate

@5

deep_search has a local LLM (Qwen3-1.7B) that expands a short query into keywords (lex), concepts (vec), and a hypothetical passage (hyde). This works, but an upstream LLM caller (Claude, GPT) already knows what the user wants — routing through a 1.7B model is a capability downgrade. Structured query lets the caller provide expansions directly, skipping the local LLM entirely: string query (auto — LLM expands): "performance" │ ▼ ┌────────────┐ │ Qwen3 1.7B │ → { lex, vec, hyde } └────────────┘ │ ▼ FTS + vector → RRF → rerank ~2s structured query (manual — caller expands, no LLM): { text, keywords, concepts, passage } │ │ │ │ ▼ ▼ ▼ ▼ BM25 FTS vector vector └───────┴─────────┴─────────┘ │ ▼ RRF → rerank ~240ms (89% faster) API (deep_search only — search/vector_search keep query: string): // Auto — Qwen drives expansion { query: "performance" } // Manual — caller drives expansion, skips LLM entirely { query: { text, keywords, concepts, passage }, intent: "web perf" } Fields route directly: keywords → FTS, concepts → vector, passage → vector. intent remains orthogonal — steers scoring in both modes. Eval (2115-doc corpus, 8 ambiguous queries across 4 topics): Each query has multiple valid interpretations (e.g. "performance" → web perf vs health, "scaling" → infra vs business). Tests whether the pipeline surfaces the right interpretation in top results. Signal Density @5 (fraction of top-5 results matching target): Baseline (LLM): 0.500 Intent (steered LLM): 0.675 (+0.175) Structured (no LLM): 0.725 (+0.225) MRR (mean reciprocal rank of first relevant result): Baseline (LLM): 0.738 Intent (steered LLM): 0.938 (+0.200) Structured (no LLM): 1.000 (+0.262) — first result always relevant Avg latency: Baseline (LLM): 2114ms Structured (no LLM): 240ms (89% faster) Quality wins come from the upstream caller being better at disambiguation than the local 1.7B model. Latency wins from skipping LLM expansion — pipeline drops to embed + vector only. Implementation: - StructuredQuery type: { text, keywords?, concepts?, passage? }. normalizeQuery() strips empty fields (keywords:[], passage:"") so hasCallerExpansions() correctly detects "no expansions provided." - Unified routing: FTS and vector paths are shared between caller and LLM expansion modes — only the source of search terms differs. ~95 lines of branching → ~65 lines with shared ftsTerms/vecTexts. - RRF weight fix: replaced positional heuristic (i < 2 ? 2.0 : 1.0) with primaryIndices: Set<number> that explicitly tracks which ranked lists represent the original query. Old code assumed list ordering. - toRankedList() helper: extracted SearchResult→RankedResult mapping that was duplicated 6x across FTS and vector search paths. - onExpand hook: only fires when LLM expansion actually ran and produced results. Silent for caller expansions and strong signal. - MCP schema: z.union([string, object]) generates anyOf in JSON Schema. Tested empirically: Opus/Sonnet handle it correctly. Haiku mis-nests intent (rejected by additionalProperties:false, fails safe). - Zod passage validation: .min(1).optional() prevents empty string from silently falling through to LLM expansion path.

bun test runs all files in one process. Our cleanup() was calling delete process.env.QMD_CONFIG_DIR, which clobbered the MCP test suite's config dir set in its own beforeAll. Five MCP tests failed (get, multi_get, status, resource) because collection context lookups need the config. Fix: save/restore the env var in beforeAll/afterAll instead of deleting it. Also merged duplicate afterAll blocks.

createTestStore() overwrites process.env.QMD_CONFIG_DIR, which in bun's single-process test runner clobbers the config for mcp.test.ts. The previous save/restore-in-afterAll approach was insufficient — the env var is overwritten during execution, not just at cleanup. Skip routing integration tests in CI with describe.skipIf (matching repo convention). The 19 unit tests (normalizeQuery, hasCallerExpansions, type contracts) still run — they don't touch the env var.

igrigorik added 2 commits February 16, 2026 16:35

igrigorik force-pushed the feat/intent branch from 7752a05 to 110da2a Compare February 17, 2026 00:53

fix: instruct MCP caller to filter by collection

de7a0e6

igrigorik changed the title ~~feat: intent param for ambiguous queries~~ feat: intent + structured query for deep_search Feb 17, 2026

igrigorik added 3 commits February 16, 2026 19:30

fix: mock rerank in routing tests

c8b7871

igrigorik marked this pull request as ready for review February 17, 2026 04:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: intent + structured query for deep_search#180

feat: intent + structured query for deep_search#180
igrigorik wants to merge 6 commits intotobi:mainfrom
igrigorik:feat/intent

igrigorik commented Feb 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

igrigorik commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Intent — steer every pipeline stage

Structured query — caller-provided expansions bypass LLM

Eval (2115-doc corpus, 8 ambiguous queries)

Implementation highlights

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

igrigorik commented Feb 16, 2026 •

edited

Loading