OpenMath

Fine-tuning a Small Language Model (SLM) for Step-by-Step Math Reasoning

Overview

OpenMath is an open-source project focused on fine-tuning a small language model (SLM) to solve math word problems with clear, step-by-step reasoning.
The project uses LoRA/QLoRA fine-tuning on popular math reasoning datasets and provides a benchmarking pipeline to compare performance against other open-source SLMs/LLMs.

This project is designed to be reproducible on free Colab (T4) GPU.

What’s Included

QLoRA fine-tuning code (4-bit)
GSM8K subset training (example: 1k samples)
GSM8K evaluation script (accuracy)
Saved LoRA adapter weights

Base Model

Qwen2.5-Math-1.5B

Dataset

GSM8K (Grade School Math 8K)
Training used: 1000 samples
Evaluation: GSM8K test split

Results

Training Setup (Current)

Samples: 1000
Epochs: 6
Max length: 1024
LoRA rank: 16
Loss masking: trained mainly on the solution portion to improve reasoning

Accuracy

GSM8K Accuracy (100-sample test subset): 41%

Note: The 41% score was measured on a 100-question subset of the GSM8K test set for faster evaluation on Colab.

Training Format & Loss Masking

Training Prompt Template

The model was trained using a structured prompt format designed to encourage step-by-step reasoning:

### Instruction:
Solve the math problem step by step and give the final answer.

### Problem:
{question}

### Solution:
{answer}

{question} and {answer} represent dataset content placeholders.

Loss Masking Strategy

To improve reasoning quality, loss was computed only on the solution portion:

All tokens before ### Solution: were masked with -100
Only tokens belonging to the solution contributed to training loss

This encourages the model to focus on generating accurate reasoning steps rather than memorizing prompt structure.

Special Tokens

No additional special tokens were introduced
Default tokenizer EOS token used as padding
Template headers serve as separators

Inference Format

During inference, the same prompt structure is used, but the solution portion is left empty:

### Instruction: ...
### Problem: {question}
### Solution:

The model then generates the reasoning and final answer.

GSM8K Leaderboard (Baseline)

Model	Params	GSM8K Accuracy (%)
LLaMA 2	13B	28.7
Gemma 2 (PT)	2B	23.9
Mistral (Base)	7B	36.5
ERNIE 4.5	21B	25.2
Baichuan (Base)	13B	26.6
Gemma	7B	46.4
Zephyr-7b-gemma-v0.1	7B	45.56
LLaMA 3.2 Instruct (CoT)	1B	39.04
Gemma 3 IT	1B	42.15
Qwen 3 (Instruct mode)	1.7B	33.66
OpenMath (Qwen2.5-Math-1.5B + LoRA)	1.5B	41.0

Repository Files

LoRA Adapter Folder

This project provides the fine-tuned adapter weights:

adapter_model.safetensors → LoRA weights
adapter_config.json → LoRA configuration

Note: This is not a full model.
You must load the base model and then attach the adapter.

Inference Example

An example script (inference.py) is provided to demonstrate how to:

Load the Qwen2.5-Math-1.5B base model
Attach the fine-tuned LoRA adapter
Run step-by-step math inference

Note: Running the script requires downloading the base model from Hugging Face.

Project Structure & Workflow

Repository Structure

OpenMath/ ├── adapter_config.json #LoRA configuration ├── adapter_model.safetensors #Fine-tuned LoRA weights ├── CONTRIBUTING.md #Contribution guidelines ├── inference.md #Script for step-by-step math inference ├── LICENSE #Apache 2.0 license └── README.md #OpenMath project Documentation

Key Files Explanation :

inference.py

This script Loads the base model (Qwen2.5-Math-1.5B)
The Script Attaches the fine-tuned LoRA adapter
Also it Generates step-by-step reasoning for math problems
This is the main script used to test the fine-tuned model.

adapter_model.safetensors

Contains the trained LoRA adapter weights.
This is not a full model checkpoint.

adapter_config.json

It helps to Stores the LoRA configuration (rank, alpha, target modules, etc.).

CONTRIBUTING.md

It Provides guidelines for contributors who want to improve the project.

LICENSE

Apache 2.0 license defining usage and distribution rights.

Current Project Workflow

Firstly Download the base model (Qwen2.5-Math-1.5B) from Hugging Face.
Load the saved LoRA adapter (adapter_model.safetensors).
Run inference.py.
Provide a math problem using the structured prompt format.
The model generates step-by-step reasoning and a final answer

Simplified Workflow Diagram

Base Model (Qwen2.5-Math-1.5B)
            +
LoRA Adapter (Fine-tuned weights)
            ↓
      inference.py
            ↓
Step-by-step math reasoning output

Disclaimer

OpenMath is an educational/research project.
The fine-tuned model may produce incorrect, incomplete, or misleading answers.
Always verify solutions independently before using them for exams, assignments, or real-world decisions.

This project does not guarantee correctness and should not be used as a substitute for professional advice.

Contributing

Contributions are welcome! 🎉

If you’d like to contribute:

Fork the repository
Create a new branch (feature/your-feature-name)
Commit your changes
Open a Pull Request

Contribution Ideas

Run full GSM8K test evaluation (1319 samples) and report results
Train on larger GSM8K subsets (3k–5k samples)
Add SVAMP / ASDiv datasets for better generalization
Improve decoding to reduce repetition
Add a Streamlit demo for interactive testing
Benchmark against more open-source SLMs/LLMs
Improve evaluation scripts and metrics

Note

OpenMath is a fun and practical side project built to explore efficient fine-tuning (QLoRA) and math reasoning evaluation on limited compute.
The goal is to learn, experiment, and share reproducible results — while keeping the code clean and open for community improvements.

License

This project is licensed under the Apache License 2.0.
See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenMath

Overview

What’s Included

Base Model

Dataset

Results

Training Setup (Current)

Accuracy

Training Format & Loss Masking

Training Prompt Template

Loss Masking Strategy

Special Tokens

Inference Format

GSM8K Leaderboard (Baseline)

Repository Files

LoRA Adapter Folder

Inference Example

Project Structure & Workflow

Repository Structure

Key Files Explanation :

Current Project Workflow

Simplified Workflow Diagram

Disclaimer

Contributing

Contribution Ideas

Note

License

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
adapter_config.json		adapter_config.json
adapter_model.safetensors		adapter_model.safetensors
inference.py		inference.py

License

AshChadha-iitg/OpenMath

Folders and files

Latest commit

History

Repository files navigation

OpenMath

Overview

What’s Included

Base Model

Dataset

Results

Training Setup (Current)

Accuracy

Training Format & Loss Masking

Training Prompt Template

Loss Masking Strategy

Special Tokens

Inference Format

GSM8K Leaderboard (Baseline)

Repository Files

LoRA Adapter Folder

Inference Example

Project Structure & Workflow

Repository Structure

Key Files Explanation :

Current Project Workflow

Simplified Workflow Diagram

Disclaimer

Contributing

Contribution Ideas

Note

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages