feat(llm): qmd query speedup - add environment variable overrides for faster model selection by dgilperez · Pull Request #112 · tobi/qmd

dgilperez · 2026-02-04T21:41:34Z

Problem

When using qmd query for latency-sensitive applications (like AI agent memory search with 5-second timeouts), the default reranker becomes a bottleneck. For instance, on my Mac Mini M4 16GB, qmd query's full process takes ~15s

Solution

This PR allows to select faster models by adding environment variables for runtime model configuration without code changes:

QMD_EMBED_MODEL - override embedding model
QMD_GENERATE_MODEL - override query expansion model
QMD_RERANK_MODEL - override reranker model
QMD_MODEL_CACHE_DIR - override model cache directory

Priority: config object > environment variable > default

I managed to get great results with Jina Reranker v1-tiny - a 33M parameter distilled model optimized for speed, without changing defaults or rebuilding.

Example:
export QMD_RERANK_MODEL='hf:gpustack/jina-reranker-v1-tiny-en-GGUF/jina-reranker-v1-tiny-en-FP16.gguf'
qmd query "your search"

Benchmarks on Mac Mini M4 16GB:

Scenario	qwen3 (default)	jina-tiny	Speedup
Cold start (8 docs)	~15,000ms	~80ms	185x
Warm cache (8 docs)	400-440ms	40-65ms	6-10x
Full pipeline cold	~20s	~7s	3x
Full pipeline warm	~15s	~5.7s	2.6x

Quality: ~78% top-3 ranking agreement (same relevant docs, order differs slightly).

Includes:

Unit tests for env var configuration
README documentation with model override examples

No breaking changes. Default remains qwen3-reranker-0.6b.

busbyjon · 2026-02-12T23:09:48Z

👍 for this PR. I'm struggling to run this model on a CPU contrained system - and this would solve it!

…arch - Model paths: QMD_EMBED_MODEL, QMD_RERANK_MODEL, QMD_GENERATE_MODEL (PR tobi#112 pattern) - Embed format: QMD_EMBED_FORMAT=qwen3 for Qwen3 Instruct format - Embed context: QMD_EMBED_CONTEXT_SIZE, QMD_EMBED_BATCH_SIZE, QMD_EMBED_PER_CONTEXT_MB - Rerank context: QMD_RERANK_CONTEXT_SIZE - Session timeout: QMD_SESSION_MAX_DURATION_MS - Model identifiers: QMD_EMBED_MODEL_ID, QMD_RERANK_MODEL_ID - Strong signal: QMD_STRONG_SIGNAL_MIN_GAP - CJK bigram splitting for FTS5 search (unicode61 tokenizer workaround) All env vars are optional — without them, behavior is identical to upstream. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add environment variables for runtime model configuration without code changes: - QMD_EMBED_MODEL - override embedding model - QMD_GENERATE_MODEL - override query expansion model - QMD_RERANK_MODEL - override reranker model - QMD_MODEL_CACHE_DIR - override model cache directory Priority: config object > environment variable > default Use case: Switch to faster reranker (jina-reranker-v1-tiny) for latency-critical applications like API timeouts, without changing defaults or rebuilding. Example: export QMD_RERANK_MODEL='hf:gpustack/jina-reranker-v1-tiny-en-GGUF/jina-reranker-v1-tiny-en-FP16.gguf' qmd query "your search" Benchmarks on Mac Mini M4 16GB: | Scenario | qwen3 (default) | jina-tiny | Speedup | |---------------------|-----------------|-----------|---------| | Cold start (8 docs) | ~15,000ms | ~80ms | 185x | | Warm cache (8 docs) | 400-440ms | 40-65ms | 6-10x | | Full pipeline cold | ~20s | ~7s | 3x | | Full pipeline warm | ~15s | ~5.7s | 2.6x | Quality: ~78% top-3 ranking agreement (same relevant docs, order differs slightly). Includes: - Unit tests for env var configuration - README documentation with model override examples No breaking changes. Default remains qwen3-reranker-0.6b.

dgilperez added a commit to Balneario-de-Cofrentes/qmd that referenced this pull request Feb 4, 2026

Merge PR tobi#112: Environment variable overrides for model selection

9d22342

dgilperez force-pushed the feat/jina-reranker-tiny branch from a5656c0 to 993faf6 Compare February 20, 2026 00:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm): qmd query speedup - add environment variable overrides for faster model selection#112

feat(llm): qmd query speedup - add environment variable overrides for faster model selection#112
dgilperez wants to merge 1 commit intotobi:mainfrom
Balneario-de-Cofrentes:feat/jina-reranker-tiny

dgilperez commented Feb 4, 2026

Uh oh!

busbyjon commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

dgilperez commented Feb 4, 2026

Problem

Solution

Uh oh!

busbyjon commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments