Fine-tuning a Small Language Model (SLM) for Step-by-Step Math Reasoning
OpenMath is an open-source project focused on fine-tuning a small language model (SLM) to solve math word problems with clear, step-by-step reasoning.
The project uses LoRA/QLoRA fine-tuning on popular math reasoning datasets and provides a benchmarking pipeline to compare performance against other open-source SLMs/LLMs.
This project is designed to be reproducible on free Colab (T4) GPU.
- QLoRA fine-tuning code (4-bit)
- GSM8K subset training (example: 1k samples)
- GSM8K evaluation script (accuracy)
- Saved LoRA adapter weights
- Qwen2.5-Math-1.5B
- GSM8K (Grade School Math 8K)
- Training used: 1000 samples
- Evaluation: GSM8K test split
- Samples: 1000
- Epochs: 6
- Max length: 1024
- LoRA rank: 16
- Loss masking: trained mainly on the solution portion to improve reasoning
- GSM8K Accuracy (100-sample test subset): 41%
Note: The 41% score was measured on a 100-question subset of the GSM8K test set for faster evaluation on Colab.
The model was trained using a structured prompt format designed to encourage step-by-step reasoning:
### Instruction:
Solve the math problem step by step and give the final answer.
### Problem:
{question}
### Solution:
{answer}
{question} and {answer} represent dataset content placeholders.
To improve reasoning quality, loss was computed only on the solution portion:
- All tokens before
### Solution:were masked with-100 - Only tokens belonging to the solution contributed to training loss
This encourages the model to focus on generating accurate reasoning steps rather than memorizing prompt structure.
- No additional special tokens were introduced
- Default tokenizer EOS token used as padding
- Template headers serve as separators
During inference, the same prompt structure is used, but the solution portion is left empty:
### Instruction: ...
### Problem: {question}
### Solution:
The model then generates the reasoning and final answer.
| Model | Params | GSM8K Accuracy (%) |
|---|---|---|
| LLaMA 2 | 13B | 28.7 |
| Gemma 2 (PT) | 2B | 23.9 |
| Mistral (Base) | 7B | 36.5 |
| ERNIE 4.5 | 21B | 25.2 |
| Baichuan (Base) | 13B | 26.6 |
| Gemma | 7B | 46.4 |
| Zephyr-7b-gemma-v0.1 | 7B | 45.56 |
| LLaMA 3.2 Instruct (CoT) | 1B | 39.04 |
| Gemma 3 IT | 1B | 42.15 |
| Qwen 3 (Instruct mode) | 1.7B | 33.66 |
| OpenMath (Qwen2.5-Math-1.5B + LoRA) | 1.5B | 41.0 |
This project provides the fine-tuned adapter weights:
adapter_model.safetensors→ LoRA weightsadapter_config.json→ LoRA configuration
Note: This is not a full model.
You must load the base model and then attach the adapter.
An example script (inference.py) is provided to demonstrate how to:
- Load the Qwen2.5-Math-1.5B base model
- Attach the fine-tuned LoRA adapter
- Run step-by-step math inference
Note: Running the script requires downloading the base model from Hugging Face.
OpenMath/ ├── adapter_config.json #LoRA configuration ├── adapter_model.safetensors #Fine-tuned LoRA weights ├── CONTRIBUTING.md #Contribution guidelines ├── inference.md #Script for step-by-step math inference ├── LICENSE #Apache 2.0 license └── README.md #OpenMath project Documentation
inference.py
- This script Loads the base model (Qwen2.5-Math-1.5B)
- The Script Attaches the fine-tuned LoRA adapter
- Also it Generates step-by-step reasoning for math problems
- This is the main script used to test the fine-tuned model.
adapter_model.safetensors
- Contains the trained LoRA adapter weights.
- This is not a full model checkpoint.
adapter_config.json
It helps to Stores the LoRA configuration (rank, alpha, target modules, etc.).
CONTRIBUTING.md
It Provides guidelines for contributors who want to improve the project.
LICENSE
Apache 2.0 license defining usage and distribution rights.
- Firstly Download the base model (Qwen2.5-Math-1.5B) from Hugging Face.
- Load the saved LoRA adapter (adapter_model.safetensors).
- Run inference.py.
- Provide a math problem using the structured prompt format.
- The model generates step-by-step reasoning and a final answer
Base Model (Qwen2.5-Math-1.5B)
+
LoRA Adapter (Fine-tuned weights)
↓
inference.py
↓
Step-by-step math reasoning output
OpenMath is an educational/research project.
The fine-tuned model may produce incorrect, incomplete, or misleading answers.
Always verify solutions independently before using them for exams, assignments, or real-world decisions.
This project does not guarantee correctness and should not be used as a substitute for professional advice.
Contributions are welcome! 🎉
If you’d like to contribute:
- Fork the repository
- Create a new branch (
feature/your-feature-name) - Commit your changes
- Open a Pull Request
- Run full GSM8K test evaluation (1319 samples) and report results
- Train on larger GSM8K subsets (3k–5k samples)
- Add SVAMP / ASDiv datasets for better generalization
- Improve decoding to reduce repetition
- Add a Streamlit demo for interactive testing
- Benchmark against more open-source SLMs/LLMs
- Improve evaluation scripts and metrics
OpenMath is a fun and practical side project built to explore efficient fine-tuning (QLoRA) and math reasoning evaluation on limited compute.
The goal is to learn, experiment, and share reproducible results — while keeping the code clean and open for community improvements.
This project is licensed under the Apache License 2.0.
See the LICENSE file for details.