Skip to content

feat(llm): support QMD_GEN_MODEL env var for query expansion model override#226

Open
OmerFarukOruc wants to merge 1 commit intotobi:mainfrom
OmerFarukOruc:feat/env-var-model-override
Open

feat(llm): support QMD_GEN_MODEL env var for query expansion model override#226
OmerFarukOruc wants to merge 1 commit intotobi:mainfrom
OmerFarukOruc:feat/env-var-model-override

Conversation

@OmerFarukOruc
Copy link

@OmerFarukOruc OmerFarukOruc commented Feb 19, 2026

Summary

Add environment variable support (QMD_GEN_MODEL) to override the query expansion model at runtime without code changes.

One-line change in the LlamaCpp constructor (src/llm.ts):

- this.generateModelUri = config.generateModel || DEFAULT_GENERATE_MODEL;
+ this.generateModelUri = process.env.QMD_GEN_MODEL || config.generateModel || DEFAULT_GENERATE_MODEL;

Motivation

v1.0.7 added LiquidAI LFM2-1.2B as an alternative base model for query expansion fine-tuning, and exports LFM2_GENERATE_MODEL / LFM2_INSTRUCT_MODEL constants from src/llm.ts. The finetune/ directory provides a complete SFT pipeline with configs/sft_lfm2.yaml and jobs/sft_lfm2.py.

However, there's currently no way for users to actually use a different model — generateModelUri only reads from LlamaCppConfig or falls back to the hardcoded DEFAULT_GENERATE_MODEL (Qwen3-1.7B). Users who fine-tune their own model have no way to point qmd at the result without modifying source code.

Precedence

QMD_GEN_MODEL env var → config.generateModelDEFAULT_GENERATE_MODEL

The env var is checked first so users can override without touching config, but programmatic config.generateModel still works as a fallback. When QMD_GEN_MODEL is not set, behavior is identical to before this change.

Usage

# Use a custom fine-tuned LFM2 model
export QMD_GEN_MODEL="hf:OrcsRise/qmd-query-expansion-lfm2-gguf/qmd-query-expansion-lfm2-q8_0.gguf"
qmd query "search term"

# Use the LFM2 instruct variant (no fine-tuning needed)
export QMD_GEN_MODEL="hf:LiquidAI/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q8_0.gguf"
qmd query "search term"

# Use a local GGUF file
export QMD_GEN_MODEL="/path/to/my-model.gguf"
qmd query "search term"

# Add to shell profile for persistence
echo 'export QMD_GEN_MODEL="hf:OrcsRise/qmd-query-expansion-lfm2-gguf/qmd-query-expansion-lfm2-q8_0.gguf"' >> ~/.zshrc

Fine-tuned Models (free to use)

I fine-tuned LFM2-1.2B using this repo's finetune/ pipeline and published the results (Apache 2.0):

Trained on tobil/qmd-query-expansion-train (5,157 examples), 5 epochs SFT with LoRA rank 16.

Testing

Verified locally by patching ~/.bun/install/global/node_modules/qmd/src/llm.ts:

  1. Without env var → uses default Qwen3-1.7B model, no behavior change ✅
  2. With QMD_GEN_MODEL set to a HuggingFace GGUF URI → downloads and uses the specified model ✅
  3. Model appears in ~/.cache/qmd/models/ with correct filename (hf_OrcsRise_qmd-query-expansion-lfm2-q8_0.gguf) ✅
  4. Query expansion output is clean and structured (lex/vec/hyde format) with the fine-tuned LFM2 ✅

Before (base LFM2, no fine-tuning — 33 repetitive queries):

├─ test · (lexical+vector)
├─ Understanding test is essential for effective implementation... · (hyde)
├─ Understanding test is essential for effective implementation... · (hyde)
├─ Understanding test is essential for effective implementation... · (hyde)
... (30+ duplicate entries)

After (fine-tuned LFM2 via QMD_GEN_MODEL — 5 focused queries):

├─ docker container timeout · (lexical+vector)
├─ container wait time · (lexical)
├─ container wait time · (vector)
└─ The topic of docker container timeout covers service call delay... · (hyde)

Context

  • v1.0.7 release notes — LFM2-1.2B added as alternative base model
  • LFM2_GENERATE_MODEL and LFM2_INSTRUCT_MODEL constants already exported from src/llm.ts
  • finetune/configs/sft_lfm2.yaml and finetune/jobs/sft_lfm2.py provide the training pipeline
  • This PR closes the loop by letting users deploy their fine-tuned models without code changes

@OmerFarukOruc
Copy link
Author

OmerFarukOruc commented Feb 19, 2026

For reference — I fine-tuned LFM2-1.2B using the finetune/configs/sft_lfm2.yaml pipeline from v1.0.7 and published the results. Both are Apache 2.0 licensed and free to use:

Trained on tobil/qmd-query-expansion-train (5,157 examples), 5 epochs SFT with LoRA rank 16 on a free Google Colab T4. The fine-tuned model produces clean lex:/vec:/hyde: structured output — tested locally with this env var patch and it works well.

@tobi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments