feat(llm): support QMD_GEN_MODEL env var for query expansion model override by OmerFarukOruc · Pull Request #226 · tobi/qmd

OmerFarukOruc · 2026-02-19T15:10:37Z

Summary

Add environment variable support (QMD_GEN_MODEL) to override the query expansion model at runtime without code changes.

One-line change in the LlamaCpp constructor (src/llm.ts):

- this.generateModelUri = config.generateModel || DEFAULT_GENERATE_MODEL;
+ this.generateModelUri = process.env.QMD_GEN_MODEL || config.generateModel || DEFAULT_GENERATE_MODEL;

Motivation

v1.0.7 added LiquidAI LFM2-1.2B as an alternative base model for query expansion fine-tuning, and exports LFM2_GENERATE_MODEL / LFM2_INSTRUCT_MODEL constants from src/llm.ts. The finetune/ directory provides a complete SFT pipeline with configs/sft_lfm2.yaml and jobs/sft_lfm2.py.

However, there's currently no way for users to actually use a different model — generateModelUri only reads from LlamaCppConfig or falls back to the hardcoded DEFAULT_GENERATE_MODEL (Qwen3-1.7B). Users who fine-tune their own model have no way to point qmd at the result without modifying source code.

Precedence

QMD_GEN_MODEL env var → config.generateModel → DEFAULT_GENERATE_MODEL

The env var is checked first so users can override without touching config, but programmatic config.generateModel still works as a fallback. When QMD_GEN_MODEL is not set, behavior is identical to before this change.

Usage

# Use a custom fine-tuned LFM2 model
export QMD_GEN_MODEL="hf:OrcsRise/qmd-query-expansion-lfm2-gguf/qmd-query-expansion-lfm2-q8_0.gguf"
qmd query "search term"

# Use the LFM2 instruct variant (no fine-tuning needed)
export QMD_GEN_MODEL="hf:LiquidAI/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q8_0.gguf"
qmd query "search term"

# Use a local GGUF file
export QMD_GEN_MODEL="/path/to/my-model.gguf"
qmd query "search term"

# Add to shell profile for persistence
echo 'export QMD_GEN_MODEL="hf:OrcsRise/qmd-query-expansion-lfm2-gguf/qmd-query-expansion-lfm2-q8_0.gguf"' >> ~/.zshrc

Fine-tuned Models (free to use)

I fine-tuned LFM2-1.2B using this repo's finetune/ pipeline and published the results (Apache 2.0):

SFT LoRA adapter: OrcsRise/qmd-query-expansion-lfm2-sft
GGUF (Q8_0, 1.19 GB): OrcsRise/qmd-query-expansion-lfm2-gguf
Colab notebook: qmd_finetune_lfm2.ipynb (free T4 GPU, ~2.5h training)

Trained on tobil/qmd-query-expansion-train (5,157 examples), 5 epochs SFT with LoRA rank 16.

Testing

Verified locally by patching ~/.bun/install/global/node_modules/qmd/src/llm.ts:

Without env var → uses default Qwen3-1.7B model, no behavior change ✅
With QMD_GEN_MODEL set to a HuggingFace GGUF URI → downloads and uses the specified model ✅
Model appears in ~/.cache/qmd/models/ with correct filename (hf_OrcsRise_qmd-query-expansion-lfm2-q8_0.gguf) ✅
Query expansion output is clean and structured (lex/vec/hyde format) with the fine-tuned LFM2 ✅

Before (base LFM2, no fine-tuning — 33 repetitive queries):

├─ test · (lexical+vector)
├─ Understanding test is essential for effective implementation... · (hyde)
├─ Understanding test is essential for effective implementation... · (hyde)
├─ Understanding test is essential for effective implementation... · (hyde)
... (30+ duplicate entries)

After (fine-tuned LFM2 via QMD_GEN_MODEL — 5 focused queries):

├─ docker container timeout · (lexical+vector)
├─ container wait time · (lexical)
├─ container wait time · (vector)
└─ The topic of docker container timeout covers service call delay... · (hyde)

Context

v1.0.7 release notes — LFM2-1.2B added as alternative base model
LFM2_GENERATE_MODEL and LFM2_INSTRUCT_MODEL constants already exported from src/llm.ts
finetune/configs/sft_lfm2.yaml and finetune/jobs/sft_lfm2.py provide the training pipeline
This PR closes the loop by letting users deploy their fine-tuned models without code changes

…erride

OmerFarukOruc · 2026-02-19T15:12:48Z

For reference — I fine-tuned LFM2-1.2B using the finetune/configs/sft_lfm2.yaml pipeline from v1.0.7 and published the results. Both are Apache 2.0 licensed and free to use:

SFT LoRA adapter: OrcsRise/qmd-query-expansion-lfm2-sft
GGUF (Q8_0, 1.19 GB): OrcsRise/qmd-query-expansion-lfm2-gguf

Trained on tobil/qmd-query-expansion-train (5,157 examples), 5 epochs SFT with LoRA rank 16 on a free Google Colab T4. The fine-tuned model produces clean lex:/vec:/hyde: structured output — tested locally with this env var patch and it works well.

@tobi

feat(llm): support QMD_GEN_MODEL env var for query expansion model ov…

7354480

…erride

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm): support QMD_GEN_MODEL env var for query expansion model override#226

feat(llm): support QMD_GEN_MODEL env var for query expansion model override#226
OmerFarukOruc wants to merge 1 commit intotobi:mainfrom
OmerFarukOruc:feat/env-var-model-override

OmerFarukOruc commented Feb 19, 2026 •

edited

Loading

Uh oh!

OmerFarukOruc commented Feb 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

OmerFarukOruc commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Precedence

Usage

Fine-tuned Models (free to use)

Testing

Before (base LFM2, no fine-tuning — 33 repetitive queries):

After (fine-tuned LFM2 via QMD_GEN_MODEL — 5 focused queries):

Context

Uh oh!

OmerFarukOruc commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

OmerFarukOruc commented Feb 19, 2026 •

edited

Loading

OmerFarukOruc commented Feb 19, 2026 •

edited

Loading