Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,7 @@ texts/
finetune/outputs/
finetune/data/train/
.claude/

# finetune-mlx artifacts (local runs)
finetune-mlx/*.nohup.log
finetune-mlx/eval_results.json
21 changes: 21 additions & 0 deletions finetune-mlx/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Training artifacts (large files)
adapters/
models/
exports/merged/
exports/*.gguf

# Keep Modelfiles (small, useful)
!exports/*.Modelfile

# Python
.venv/
__pycache__/
*.pyc

# Logs
*.log
*.nohup.log

# Data (downloaded separately)
data/
eval_results.json
115 changes: 115 additions & 0 deletions finetune-mlx/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# QMD Query Expansion - Apple Silicon (MLX)

Apple Silicon alternative to the CUDA-based [`finetune/`](../finetune/) directory.

Port of QMD's query expansion fine-tuning to Apple Silicon using [MLX](https://github.com/ml-explore/mlx).

Train small language models locally on M1/M2/M3/M4 Macs to expand search queries for hybrid retrieval.

## Features

- **SFT Training**: Supervised fine-tuning with LoRA
- **GRPO Training**: Group Relative Policy Optimization (reinforcement learning)
- **100% Local**: No cloud GPU needed, runs on Apple Silicon
- **MLX Optimized**: Native Metal acceleration

## Results

Comparison with original NVIDIA A10G implementation:

| Metric | NVIDIA (SFT+GRPO) | Apple Silicon (SFT) | Apple Silicon (GRPO) |
|--------|-------------------|---------------------|----------------------|
| Avg Score | 92% | 99.6% | 100.4% |
| Perfect Queries | 30/30 | 28/30 | 28/30 |
| Hardware | A10G 24GB | Mac Mini M4 | Mac Mini M4 |
| Cost | ~$2/run | $0 | $0 |

## Quick Start

```bash
# Setup
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Download and convert base model
python -c "from mlx_lm import load; load('Qwen/Qwen3-1.7B')"

# Train SFT (supervised fine-tuning)
python train.py sft --iters 3500

# Train GRPO (reinforcement learning refinement)
python grpo.py --steps 200

# Evaluate
python grpo.py --eval-only --adapter adapters/qwen3-grpo
```

## What It Does

Given a query like `"auth config"`, the model produces structured expansions:

```
lex: authentication configuration
lex: auth settings setup
vec: how to configure authentication settings
hyde: Authentication can be configured by setting AUTH_SECRET...
```

These feed into QMD's hybrid retrieval:
- `lex:` → BM25 full-text search
- `vec:` → Vector similarity search
- `hyde:` → Hypothetical document embedding

## File Structure

```
├── train.py # SFT training script
├── grpo.py # GRPO (RL) training script
├── eval.py # Evaluation utilities
├── reward.py # Scoring/reward function
├── convert.py # GGUF conversion for Ollama
├── configs/
│ └── sft.yaml # SFT hyperparameters
├── evals/
│ └── queries.txt # Test queries (31 total)
└── tests/ # Unit tests
```

## Requirements

- macOS with Apple Silicon (M1/M2/M3/M4)
- Python 3.10+
- ~8GB RAM for training
- ~4GB disk for models

## Training Details

### SFT (Supervised Fine-Tuning)
- Base model: Qwen3-1.7B
- LoRA rank: 8, layers: 8
- Learning rate: 1e-4
- Steps: 3500
- Time: ~60 min on M4

### GRPO (Group Relative Policy Optimization)
- Starts from SFT checkpoint
- 4 completions per query
- KL regularization (β=0.04)
- Steps: 200
- Time: ~30 min on M4

## Credits

- Original QMD: [tobi/qmd](https://github.com/tobi/qmd)
- MLX framework: [ml-explore/mlx](https://github.com/ml-explore/mlx)
- Base model: [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B)

## Contributors

- [@sujito00](https://github.com/sujito00)
- [@dgilperez](https://github.com/dgilperez)

## License

MIT
21 changes: 21 additions & 0 deletions finetune-mlx/configs/sft.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# SFT Training Config for QMD Query Expansion (Apple Silicon)

model:
base: "Qwen/Qwen3-1.7B"
output: "qmd-query-expansion-1.7B-sft"

dataset:
name: "tobil/qmd-query-expansion-train-v2"
text_field: "text"
eval_split: 0.1

training:
batch_size: 4
iters: 3000
learning_rate: 2e-4
max_length: 512
grad_accumulation_steps: 4

lora:
num_layers: 16
rank: 16
Loading
Loading