Skip to content

Feature: --no-rerank flag for qmd query on CPU-only machines #231

@wuhup

Description

@wuhup

Problem

On CPU-only machines (4 vCPU, 8GB RAM, shared Intel Xeon, no GPU),
qmd query is unusable because the reranking step takes 120s+ for 20 chunks.

The 1.0.8 query document format solves expansion: vec: skips the 1.7B LLM.
Embedding with the 300M model takes only 2-4s. But structuredSearch()
always reranks, and the 0.6B reranker running 20+ inference passes on
shared vCPUs takes >120s, making qmd query timeout every time.

vsearch works (~6s) because it skips reranking, but it still forces
query expansion through the 1.7B model, and doesn't support the new
query document syntax.

Benchmarks (4 vCPU, 8GB, gpu:false patch per #194)

Command Time Notes
qmd search "query" 1.7s BM25 only, fast
qmd vsearch "query" 6s Expand + embed, no rerank
qmd query "vec: query" >120s Embed 2s + rerank 20 chunks timeout
qmd query "lex: a\nvec: b" >60s Embed 4s + rerank 38 chunks timeout

Proposal

Add --no-rerank flag to qmd query that returns RRF-fused results directly,
skipping the chunked reranking step (lines 2602-2606 in store.js).

This would give CPU-only users the best of both worlds:

  • Query document syntax (lex/vec/hyde combinations)
  • RRF fusion across multiple result sets
  • No 120s+ reranking penalty

Expected time with --no-rerank: ~4-6s (lex instant + embed 2-4s + RRF instant).

Alternatively, making vsearch accept query document syntax (so vec: queries
skip expansion) would also solve the problem.

Environment

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions