feat: Apple Silicon (MLX) support for query expansion fine-tuning by dgilperez · Pull Request #77 · tobi/qmd

dgilperez · 2026-01-31T01:11:55Z

Summary

Port of the query expansion fine-tuning pipeline to Apple Silicon using MLX.

SFT training with LoRA on Qwen3-1.7B
GRPO training (reinforcement learning refinement)
100% local - no cloud GPU needed
Works on M1/M2/M3/M4 Macs

Observation

When testing, we noticed the current published model outputs placeholder text for hyde (see #75):

This is an example of a hypothetical document passage that would answer the e...

Our retrained model (same dataset, same base model) produces actual contextual content:

hyde: To deploy to Kubernetes, first install the required dependencies...
lex: docker kubernetes deployment setup
vec: how to deploy an application to a Kubernetes cluster

Questions before proceeding

We wanted to raise these questions rather than assume:

Model weights: Should we share just the training method, or also publish trained GGUF weights?
Repository structure: Should finetune-mlx/ live alongside finetune/ in this repo, or would you prefer a separate repository?
Hyde placeholder issue: Do you know why the current published model outputs placeholder hyde text? Was this intentional, or an early checkpoint?
Base model: We matched your config (Qwen3-1.7B, 3000 iters, 2e-4 LR). Is there flexibility here or should we stick to this?

What's included

finetune-mlx/
├── train.py          # SFT training
├── grpo.py           # GRPO (RL) training  
├── eval.py           # Evaluation
├── reward.py         # Scoring function
├── convert.py        # GGUF conversion helper
├── demo.py           # Quick test script
├── configs/sft.yaml  # Training config
└── tests/            # Unit tests

Training artifacts (adapters/, models/, exports/) are gitignored.

Contributors

tobi · 2026-02-01T21:37:54Z

that is pretty damn cool

dgilperez · 2026-02-04T11:56:57Z

Thank you 😊 !!

Just re-verified the training after syncing reward.py with upstream. 3000 iters, ~15 min on M3 Max, final loss 0.13.

For the open questions: I matched your config roughly (Qwen3-1.7B, 3000 iters, 2e-4 LR, LoRA rank 16 and batch size) and kept finetune-mlx/ alongside finetune/. Happy to publish the GGUF weights if useful.

I'm also happy to keep this as a separate repo or maintain it here if needed

Port of the query expansion fine-tuning pipeline to Apple Silicon using MLX. - SFT training with LoRA on Qwen3-1.7B - GRPO training (reinforcement learning refinement) - Full GGUF export pipeline (MLX -> GGUF -> Ollama) - Evaluation harness with reward scoring - 100% local - no cloud GPU needed, works on M1/M2/M3/M4 Includes: finetune-mlx/ directory with training scripts, configs, evaluation tools, and scripts/mlx_expand.py standalone sidecar. Runtime integration (src/llm.ts sidecar) omitted - upstream removed MLX sidecar code in v1.0.8 refactor. Training pipeline is independent. Co-authored-by: sujito00 <sujito00@users.noreply.github.com> Co-authored-by: David Gil <dgilperez@gmail.com>

dgilperez force-pushed the feat/finetune-mlx-apple-silicon branch from a5ab100 to 4c76472 Compare January 31, 2026 15:28

dgilperez force-pushed the feat/finetune-mlx-apple-silicon branch from 4c76472 to dfbe35b Compare February 4, 2026 11:57

dgilperez force-pushed the feat/finetune-mlx-apple-silicon branch from bf6e46b to d314c5e Compare February 20, 2026 02:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Apple Silicon (MLX) support for query expansion fine-tuning#77

feat: Apple Silicon (MLX) support for query expansion fine-tuning#77
dgilperez wants to merge 1 commit intotobi:mainfrom
Balneario-de-Cofrentes:feat/finetune-mlx-apple-silicon

dgilperez commented Jan 31, 2026 •

edited

Loading

Uh oh!

tobi commented Feb 1, 2026

Uh oh!

dgilperez commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

dgilperez commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Observation

Questions before proceeding

What's included

Contributors

Uh oh!

tobi commented Feb 1, 2026

Uh oh!

dgilperez commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

dgilperez commented Jan 31, 2026 •

edited

Loading