Feature: --no-rerank flag for qmd query on CPU-only machines

## Problem

On CPU-only machines (4 vCPU, 8GB RAM, shared Intel Xeon, no GPU),
`qmd query` is unusable because the reranking step takes 120s+ for 20 chunks.

The 1.0.8 query document format solves expansion: `vec:` skips the 1.7B LLM.
Embedding with the 300M model takes only 2-4s. But `structuredSearch()`
always reranks, and the 0.6B reranker running 20+ inference passes on
shared vCPUs takes >120s, making `qmd query` timeout every time.

`vsearch` works (~6s) because it skips reranking, but it still forces
query expansion through the 1.7B model, and doesn't support the new
query document syntax.

## Benchmarks (4 vCPU, 8GB, gpu:false patch per #194)

| Command                        | Time    | Notes                          |
|--------------------------------|---------|--------------------------------|
| `qmd search "query"`          | 1.7s    | BM25 only, fast                |
| `qmd vsearch "query"`         | 6s      | Expand + embed, no rerank      |
| `qmd query "vec: query"`      | >120s   | Embed 2s + rerank 20 chunks timeout |
| `qmd query "lex: a\nvec: b"`  | >60s    | Embed 4s + rerank 38 chunks timeout |

## Proposal

Add `--no-rerank` flag to `qmd query` that returns RRF-fused results directly,
skipping the chunked reranking step (lines 2602-2606 in store.js).

This would give CPU-only users the best of both worlds:
- Query document syntax (lex/vec/hyde combinations)
- RRF fusion across multiple result sets
- No 120s+ reranking penalty

Expected time with `--no-rerank`: ~4-6s (lex instant + embed 2-4s + RRF instant).

Alternatively, making `vsearch` accept query document syntax (so `vec:` queries
skip expansion) would also solve the problem.

## Environment

- qmd 1.0.8 (commit 6ac7c68), dist/llm.js patched to gpu:false per #194
- Linux x86_64, 4 shared vCPU, 8GB RAM, no GPU
- node-llama-cpp CPU-only (Vulkan prebuilt incompatible, no CUDA)

## Related

- #194 Vulkan probe overhead (workaround applied)
- #170 maxThreads for resource-constrained systems
- #229 / #114 Cloud model providers (alternative approach)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: --no-rerank flag for qmd query on CPU-only machines #231

Problem

Benchmarks (4 vCPU, 8GB, gpu:false patch per #194)

Proposal

Environment

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Command	Time	Notes
`qmd search "query"`	1.7s	BM25 only, fast
`qmd vsearch "query"`	6s	Expand + embed, no rerank
`qmd query "vec: query"`	>120s	Embed 2s + rerank 20 chunks timeout
`qmd query "lex: a\nvec: b"`	>60s	Embed 4s + rerank 38 chunks timeout

Feature: --no-rerank flag for qmd query on CPU-only machines #231

Description

Problem

Benchmarks (4 vCPU, 8GB, gpu:false patch per #194)

Proposal

Environment

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions