Skip to content

feat: Apple Silicon (MLX) support for query expansion fine-tuning#77

Open
dgilperez wants to merge 1 commit intotobi:mainfrom
Balneario-de-Cofrentes:feat/finetune-mlx-apple-silicon
Open

feat: Apple Silicon (MLX) support for query expansion fine-tuning#77
dgilperez wants to merge 1 commit intotobi:mainfrom
Balneario-de-Cofrentes:feat/finetune-mlx-apple-silicon

Conversation

@dgilperez
Copy link
Contributor

@dgilperez dgilperez commented Jan 31, 2026

Summary

Port of the query expansion fine-tuning pipeline to Apple Silicon using MLX.

  • SFT training with LoRA on Qwen3-1.7B
  • GRPO training (reinforcement learning refinement)
  • 100% local - no cloud GPU needed
  • Works on M1/M2/M3/M4 Macs

Observation

When testing, we noticed the current published model outputs placeholder text for hyde (see #75):

This is an example of a hypothetical document passage that would answer the e...

Our retrained model (same dataset, same base model) produces actual contextual content:

hyde: To deploy to Kubernetes, first install the required dependencies...
lex: docker kubernetes deployment setup
vec: how to deploy an application to a Kubernetes cluster

Questions before proceeding

We wanted to raise these questions rather than assume:

  1. Model weights: Should we share just the training method, or also publish trained GGUF weights?
  2. Repository structure: Should finetune-mlx/ live alongside finetune/ in this repo, or would you prefer a separate repository?
  3. Hyde placeholder issue: Do you know why the current published model outputs placeholder hyde text? Was this intentional, or an early checkpoint?
  4. Base model: We matched your config (Qwen3-1.7B, 3000 iters, 2e-4 LR). Is there flexibility here or should we stick to this?

What's included

finetune-mlx/
├── train.py          # SFT training
├── grpo.py           # GRPO (RL) training  
├── eval.py           # Evaluation
├── reward.py         # Scoring function
├── convert.py        # GGUF conversion helper
├── demo.py           # Quick test script
├── configs/sft.yaml  # Training config
└── tests/            # Unit tests

Training artifacts (adapters/, models/, exports/) are gitignored.

Contributors

@dgilperez dgilperez force-pushed the feat/finetune-mlx-apple-silicon branch from a5ab100 to 4c76472 Compare January 31, 2026 15:28
@tobi
Copy link
Owner

tobi commented Feb 1, 2026

that is pretty damn cool

@dgilperez
Copy link
Contributor Author

Thank you 😊 !!

Just re-verified the training after syncing reward.py with upstream. 3000 iters, ~15 min on M3 Max, final loss 0.13.

For the open questions: I matched your config roughly (Qwen3-1.7B, 3000 iters, 2e-4 LR, LoRA rank 16 and batch size) and kept finetune-mlx/ alongside finetune/. Happy to publish the GGUF weights if useful.

I'm also happy to keep this as a separate repo or maintain it here if needed

@dgilperez dgilperez force-pushed the feat/finetune-mlx-apple-silicon branch from 4c76472 to dfbe35b Compare February 4, 2026 11:57
Port of the query expansion fine-tuning pipeline to Apple Silicon using MLX.

- SFT training with LoRA on Qwen3-1.7B
- GRPO training (reinforcement learning refinement)
- Full GGUF export pipeline (MLX -> GGUF -> Ollama)
- Evaluation harness with reward scoring
- 100% local - no cloud GPU needed, works on M1/M2/M3/M4

Includes: finetune-mlx/ directory with training scripts, configs,
evaluation tools, and scripts/mlx_expand.py standalone sidecar.

Runtime integration (src/llm.ts sidecar) omitted - upstream removed
MLX sidecar code in v1.0.8 refactor. Training pipeline is independent.

Co-authored-by: sujito00 <sujito00@users.noreply.github.com>
Co-authored-by: David Gil <dgilperez@gmail.com>
@dgilperez dgilperez force-pushed the feat/finetune-mlx-apple-silicon branch from bf6e46b to d314c5e Compare February 20, 2026 02:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments