Skip to content

mohamed-elkholy95/finetune-llm-lora

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 LoRA Fine-Tuning LLM

Efficient LLM fine-tuning with LoRA/QLoRA adapters, configurable training pipeline, and production-ready serving

Python Transformers PEFT Tests

Overview

A complete LLM fine-tuning framework using Low-Rank Adaptation (LoRA) and Quantized LoRA (QLoRA) via the PEFT library. Designed for learning and experimentation — every module includes educational comments explaining the underlying ML concepts.

The framework runs end-to-end: from data preparation through training, evaluation, and deployment. All GPU-dependent components gracefully fall back to mock implementations, so you can explore the full pipeline on any machine.

Features

  • 🔧 Configurable LoRA — Rank, alpha, dropout, and target module selection via builder pattern
  • 📝 Data Pipeline — HuggingFace Hub loading, instruction formatting, quality filtering with deduplication, and reproducible train/test splits
  • 🏋️ Training Pipeline — Epoch-level loss tracking with overfitting detection and convergence diagnostics
  • 🎯 Inference Engine — Text generation with parameter validation, repetition penalty, and batch processing
  • 📊 Evaluation Suite — Perplexity, BLEU, ROUGE, and lexical diversity metrics with unified runner
  • 🚀 REST API — FastAPI server with single/batch/chat endpoints and request timing
  • 📈 5-Page Dashboard — Interactive Streamlit app for training, evaluation, chat, and configuration
  • 🧪 70+ Tests — Comprehensive test suite covering all modules with mock fallbacks

Quick Start

git clone https://github.com/mohamed-elkholy95/finetune-llm-lora.git
cd finetune-llm-lora
pip install -r requirements.txt

# Run tests (no GPU needed)
python -m pytest tests/ -v

# Launch dashboard
streamlit run streamlit_app/app.py

# Start API server
python -m src.deploy.api_server

QLoRA Training (GPU Required)

# Full training on Dolly-15k with Qwen2.5-0.5B
python train_qlora_qwen.py

# Quick test run
python train_qlora_qwen.py --dry-run

# Experiment with higher rank
python train_qlora_qwen.py --lora-rank 16 --max-samples 1000

API Endpoints

Method Endpoint Description
GET /health Health check
GET /model/info Model metadata and LoRA config
POST /generate Single-prompt text generation
POST /chat Chat-style message completion
POST /batch Batch generation for multiple prompts

Project Structure

See docs/ARCHITECTURE.md for the full architecture guide, including how LoRA works, data flow diagrams, and hyperparameter selection guidelines.

Testing

# Run all tests
python -m pytest tests/ -v

# Run specific module tests
python -m pytest tests/test_trainer.py -v
python -m pytest tests/test_inference.py -v
python -m pytest tests/test_diversity.py -v

Author

Mohamed ElkholyGitHub · melkholy@techmatrix.com

About

LLM fine-tuning with LoRA/PEFT adapters, configurable hyperparameters, and training pipeline with evaluation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages