AshChadha-iitg · prem-create · Feb 10, 2026 · Feb 14, 2026
diff --git a/MODEL_CARD.md b/MODEL_CARD.md
@@ -0,0 +1,225 @@
+# Model Card: OpenMath
+
+## Model Details
+
+### Model Description
+
+OpenMath is a fine-tuned small language model (SLM) specialized in solving math word problems with step-by-step reasoning. The model uses QLoRA (Quantized Low-Rank Adaptation) fine-tuning on the Qwen2.5-Math-1.5B base model.
+
+- **Developed by:** OpenMath Project Contributors
+- **Model type:** Causal Language Model (Math Reasoning)
+- **Language:** English
+- **License:** Apache License 2.0
+- **Base Model:** Qwen/Qwen2.5-Math-1.5B
+- **Fine-tuning Method:** QLoRA (4-bit quantization with LoRA adapters)
+- **Parameters:** 1.5B (base model) + LoRA adapters
+
+### Model Sources
+
+- **Repository:** [OpenMath GitHub Repository](https://github.com/AshChadha-iitg/OpenMath)
+- **Base Model:** [Qwen/Qwen2.5-Math-1.5B](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B)
+
+## Uses
+
+### Direct Use
+
+This model is designed for educational and research purposes to:
+- Solve grade-school level math word problems
+- Generate step-by-step mathematical reasoning
+- Demonstrate efficient fine-tuning techniques on limited compute resources
+
+### Downstream Use
+
+The model can be used as a starting point for:
+- Further fine-tuning on additional math datasets
+- Integration into educational applications
+- Research on small language model capabilities in mathematical reasoning
+
+### Out-of-Scope Use
+
+- **Production systems requiring high accuracy:** The model achieves 41% accuracy and should not be used for critical applications
+- **Advanced mathematics:** The model is trained on grade-school level problems only
+- **Homework/exam solving without verification:** Always verify solutions independently
+- **Professional mathematical advice or calculations**
+
+## Bias, Risks, and Limitations
+
+### Known Limitations
+
+- **Accuracy:** 41% on GSM8K test subset (100 samples) - the model produces incorrect answers in the majority of cases
+- **Training data size:** Only trained on 1,000 samples from GSM8K, limiting generalization
+- **Repetition issues:** May generate repetitive text during inference
+- **Domain specificity:** Limited to grade-school math problems similar to GSM8K
+- **Incomplete reasoning:** May produce incomplete or misleading step-by-step solutions
+
+### Recommendations
+
+Users should:
+- Always verify model outputs independently
+- Not rely on this model for educational assessments or real-world decisions
+- Understand this is a research/educational project, not a production-ready system
+- Use appropriate repetition penalties and decoding strategies to improve output quality
+
+## Training Details
+
+### Training Data
+
+- **Dataset:** GSM8K (Grade School Math 8K)
+- **Training samples:** 1,000 samples from the GSM8K training set
+- **Data format:** Math word problems with step-by-step solutions
+
+### Training Procedure
+
+#### Training Hyperparameters
+
+- **Training regime:** 4-bit QLoRA fine-tuning
+- **Epochs:** 6
+- **Max sequence length:** 1024 tokens
+- **LoRA rank (r):** 16
+- **LoRA alpha:** 32
+- **LoRA dropout:** 0.05
+- **Target modules:** q_proj, o_proj, k_proj, v_proj
+- **Quantization:** 4-bit NF4 with double quantization
+- **Compute dtype:** float16
+- **Loss masking:** Trained primarily on solution portions to improve reasoning
+
+#### Hardware
+
+- **GPU:** NVIDIA T4 (free Google Colab tier)
+- **Training time:** Reproducible on free Colab resources
+
+#### Software
+
+- **Framework:** PyTorch, Transformers, PEFT
+- **Quantization:** BitsAndBytes (4-bit)
+- **Fine-tuning:** LoRA/QLoRA
+
+## Evaluation
+
+### Testing Data & Metrics
+
+#### Testing Data
+
+- **Dataset:** GSM8K test split
+- **Evaluation samples:** 100-question subset (for faster evaluation on Colab)
+
+#### Metrics
+
+- **Primary metric:** Accuracy (exact match)
+- **GSM8K Accuracy:** 41.0% (on 100-sample test subset)
+
+### Results
+
+| Model | Parameters | GSM8K Accuracy (%) |
+|-------|-----------|-------------------|
+| LLaMA 2 | 13B | 28.7 |
+| Gemma 2 (PT) | 2B | 23.9 |
+| Mistral (Base) | 7B | 36.5 |
+| LLaMA 3.2 Instruct (CoT) | 1B | 39.04 |
+| **OpenMath (Qwen2.5-Math-1.5B + LoRA)** | **1.5B** | **41.0** |
+| Gemma 3 IT | 1B | 42.15 |
+| Zephyr-7b-gemma-v0.1 | 7B | 45.56 |
+| Gemma | 7B | 46.4 |
+
+#### Benchmark Comparison Graph
+
+```mermaid
+%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#4CAF50', 'primaryTextColor':'#000', 'primaryBorderColor':'#2E7D32', 'lineColor':'#1976D2', 'secondaryColor':'#FFC107', 'tertiaryColor':'#fff'}}}%%
+graph LR
+    subgraph "GSM8K Accuracy Comparison (%)"
+        A["Gemma 2 PT<br/>2B: 23.9%"] 
+        B["ERNIE 4.5<br/>21B: 25.2%"]
+        C["Baichuan<br/>13B: 26.6%"]
+        D["LLaMA 2<br/>13B: 28.7%"]
+        E["Qwen 3 IT<br/>1.7B: 33.66%"]
+        F["Mistral<br/>7B: 36.5%"]
+        G["LLaMA 3.2 IT<br/>1B: 39.04%"]
+        H["OpenMath<br/>1.5B: 41.0%"]
+        I["Gemma 3 IT<br/>1B: 42.15%"]
+        J["Zephyr-7b<br/>7B: 45.56%"]
+        K["Gemma<br/>7B: 46.4%"]
+    end
+
+    style H fill:#4CAF50,stroke:#2E7D32,stroke-width:3px,color:#fff
+```
+
+**Performance Visualization:**
+
+```
+Gemma 2 PT (2B)         ████████████ 23.9%
+ERNIE 4.5 (21B)         █████████████ 25.2%
+Baichuan (13B)          ██████████████ 26.6%
+LLaMA 2 (13B)           ███████████████ 28.7%
+Qwen 3 IT (1.7B)        █████████████████ 33.66%
+Mistral (7B)            ███████████████████ 36.5%
+LLaMA 3.2 IT (1B)       ████████████████████ 39.04%
+OpenMath (1.5B)         █████████████████████ 41.0% ⭐
+Gemma 3 IT (1B)         █████████████████████ 42.15%
+Zephyr-7b (7B)          ███████████████████████ 45.56%
+Gemma (7B)              ████████████████████████ 46.4%
+                        |----|----|----|----|----|----|
+                        0   10   20   30   40   50
+```
+
+OpenMath achieves competitive performance compared to other small language models while being trained on only 1,000 samples and reproducible on free Colab resources.
+
+## Technical Specifications
+
+### Model Architecture
+
+- **Base architecture:** Qwen2.5-Math-1.5B (Transformer-based causal LM)
+- **Adapter type:** LoRA (Low-Rank Adaptation)
+- **Quantization:** 4-bit NF4 quantization
+
+### Compute Infrastructure
+
+- **Training:** Google Colab (T4 GPU, free tier)
+- **Inference:** Compatible with T4 GPU or similar (requires ~6-8GB VRAM with 4-bit quantization)
+
+### Input Format
+
+The model expects prompts in the following format:
+
+```
+### Instruction:
+Solve the math problem step by step and give the final answer.
+
+### Problem:
+[Your math problem here]
+
+### Solution:
+```
+
+### Generation Parameters
+
+Recommended inference settings:
+- `max_new_tokens`: 200
+- `do_sample`: False (deterministic for math)
+- `repetition_penalty`: 1.1
+- `no_repeat_ngram_size`: 3
+
+## Environmental Impact
+
+- **Hardware Type:** NVIDIA T4 GPU
+- **Hours used:** Minimal (reproducible on free Colab)
+- **Cloud Provider:** Google Colab
+- **Carbon Emitted:** Minimal due to efficient QLoRA training on limited samples
+
+## Citation
+
+```bibtex
+@software{openmath2024,
+  title={OpenMath: Fine-tuning Small Language Models for Math Reasoning},
+  author={OpenMath Contributors},
+  year={2024},
+  license={Apache-2.0}
+}
+```
+
+## Model Card Authors
+
+OpenMath Project Contributors
+
+## Model Card Contact
+
+[Repository Issues Page]