Efficient LLM fine-tuning with LoRA/QLoRA adapters, configurable training pipeline, and production-ready serving
A complete LLM fine-tuning framework using Low-Rank Adaptation (LoRA) and Quantized LoRA (QLoRA) via the PEFT library. Designed for learning and experimentation — every module includes educational comments explaining the underlying ML concepts.
The framework runs end-to-end: from data preparation through training, evaluation, and deployment. All GPU-dependent components gracefully fall back to mock implementations, so you can explore the full pipeline on any machine.
- 🔧 Configurable LoRA — Rank, alpha, dropout, and target module selection via builder pattern
- 📝 Data Pipeline — HuggingFace Hub loading, instruction formatting, quality filtering with deduplication, and reproducible train/test splits
- 🏋️ Training Pipeline — Epoch-level loss tracking with overfitting detection and convergence diagnostics
- 🎯 Inference Engine — Text generation with parameter validation, repetition penalty, and batch processing
- 📊 Evaluation Suite — Perplexity, BLEU, ROUGE, and lexical diversity metrics with unified runner
- 🚀 REST API — FastAPI server with single/batch/chat endpoints and request timing
- 📈 5-Page Dashboard — Interactive Streamlit app for training, evaluation, chat, and configuration
- 🧪 70+ Tests — Comprehensive test suite covering all modules with mock fallbacks
git clone https://github.com/mohamed-elkholy95/finetune-llm-lora.git
cd finetune-llm-lora
pip install -r requirements.txt
# Run tests (no GPU needed)
python -m pytest tests/ -v
# Launch dashboard
streamlit run streamlit_app/app.py
# Start API server
python -m src.deploy.api_server# Full training on Dolly-15k with Qwen2.5-0.5B
python train_qlora_qwen.py
# Quick test run
python train_qlora_qwen.py --dry-run
# Experiment with higher rank
python train_qlora_qwen.py --lora-rank 16 --max-samples 1000| Method | Endpoint | Description |
|---|---|---|
| GET | /health |
Health check |
| GET | /model/info |
Model metadata and LoRA config |
| POST | /generate |
Single-prompt text generation |
| POST | /chat |
Chat-style message completion |
| POST | /batch |
Batch generation for multiple prompts |
See docs/ARCHITECTURE.md for the full architecture guide, including how LoRA works, data flow diagrams, and hyperparameter selection guidelines.
# Run all tests
python -m pytest tests/ -v
# Run specific module tests
python -m pytest tests/test_trainer.py -v
python -m pytest tests/test_inference.py -v
python -m pytest tests/test_diversity.py -vMohamed Elkholy — GitHub · melkholy@techmatrix.com