Skip to content

Comments

Add QLoRA training script for GSM8K fine-tuningAdd train script#11

Open
noor05-creator wants to merge 3 commits intoAshChadha-iitg:mainfrom
noor05-creator:add-train-script
Open

Add QLoRA training script for GSM8K fine-tuningAdd train script#11
noor05-creator wants to merge 3 commits intoAshChadha-iitg:mainfrom
noor05-creator:add-train-script

Conversation

@noor05-creator
Copy link

@noor05-creator noor05-creator commented Feb 9, 2026

Summary

This PR adds a complete training script for fine-tuning Qwen2.5-Math-1.5B on GSM8K using QLoRA (4-bit quantization).

Closes #3

Changes

  • ✅ Added train.py with QLoRA 4-bit quantization
  • ✅ Updated README with training instructions and CLI arguments
  • ✅ Supports all hyperparameters from adapter_config.json
  • ✅ Optimized for free Colab T4 GPU (12GB+ VRAM)

Features

  • QLoRA: 4-bit quantization using bitsandbytes
  • Dataset: GSM8K integration via Hugging Face datasets
  • Loss Masking: Trains only on answer portion for better reasoning
  • CLI Arguments: 13 configurable parameters for flexibility
  • Memory Efficient: ~11GB VRAM usage on default settings

Configuration Match

All hyperparameters match existing adapter_config.json:

  • LoRA rank (r): 16 ✅
  • LoRA alpha: 32 ✅
  • LoRA dropout: 0.05 ✅
  • Target modules: ["q_proj", "k_proj", "v_proj", "o_proj"]
  • Task type: CAUSAL_LM

Usage

# Install dependencies
pip install torch transformers peft datasets bitsandbytes accelerate

# Reproduce original results (1000 samples, 6 epochs)
python train.py --num_samples 1000 --num_epochs 6

# Train on more data
python train.py --num_samples 5000 --num_epochs 4

# Customize hyperparameters
python train.py --lora_rank 32 --learning_rate 3e-4

Testing

  • ✅ Tested on Google Colab T4 GPU
  • ✅ Training completes successfully (verified with 10 sample quick test)
  • ✅ Model loads in 4-bit quantization
  • ✅ LoRA adapters save correctly to ./checkpoints/
  • ✅ Output format compatible with existing adapter weights

Training Performance

  • Test run: 10 samples, 1 epoch completed in 9.5 seconds on T4
  • Expected full run: 1000 samples, 6 epochs = ~1.5 hours on T4
  • Memory usage: ~11GB VRAM (fits on free Colab T4)
  • Trainable parameters: 4.36M (0.28% of total)

Implementation Details

  • Uses BitsAndBytesConfig for NF4 quantization
  • Implements gradient checkpointing for memory efficiency
  • Uses paged_adamw_8bit optimizer
  • Includes warmup and cosine learning rate scheduling
  • Masks question tokens in loss calculation (focus on reasoning)

This makes the project fully reproducible as requested in the issue. Users can now train their own models and experiment with different configurations.

Contributing to OScG'26

This contribution is part of OScG'26


- Implement complete training pipeline with 4-bit quantization
- Support QLoRA fine-tuning on Qwen2.5-Math-1.5B model
- Add 13 configurable CLI arguments for flexibility
- Match adapter_config.json hyperparameters (rank=16, alpha=32)
- Include loss masking to train only on answer portions
- Optimize memory usage for free Colab T4 GPU (~11GB VRAM)
- Use BitsAndBytesConfig for NF4 quantization
- Implement gradient checkpointing and paged optimizer
- Enable reproducibility of 41% GSM8K accuracy results
- Add Training section with setup and usage guide
- Document all CLI arguments in table format
- Include installation instructions for dependencies
- Provide example commands for different training scenarios
- Specify hardware requirements (12GB+ VRAM, T4 tested)
- Add training time estimates for different configurations
- Enable users to reproduce and extend the original results
@AshChadha-iitg
Copy link
Owner

@noor05-creator Thanks for the detailed contribution.

Before merging, I need to ensure reproducibility with the original OpenMath results (41% on 100-question GSM8K subset). Right now, the training pipeline differs from the original in a few key ways:

  1. Prompt format ("Question/Answer") is different from OpenMath’s format ("### Instruction / ### Problem / ### Solution").
  2. Loss masking logic does not exactly match the original implementation.
  3. Evaluation setup differs from how results were measured in the repo.
  4. Model saving format may not exactly match existing adapter weights.

Could you please:

  • Align the prompt template with the repo’s existing format,
  • Match the original loss-masking logic (mask everything before "### Solution:"),
  • Ensure adapters are saved in the same format as current weights?

Once these are aligned, I’m happy to merge. Thanks again for your work!

- Switch prompt format to Instruction / Problem / Solution
- Mask loss for all tokens before "### Solution:" to match original training
- Save LoRA adapters only (adapter_model + adapter_config) for repo compatibility
@noor05-creator
Copy link
Author

@AshChadha-iitg I have done the asked changes.Please Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Training Script for Model Fine-tuning

2 participants