🧠 LoRA Fine-Tuning LLM

Efficient LLM fine-tuning with LoRA/QLoRA adapters, configurable training pipeline, and production-ready serving

Overview

A complete LLM fine-tuning framework using Low-Rank Adaptation (LoRA) and Quantized LoRA (QLoRA) via the PEFT library. Designed for learning and experimentation — every module includes educational comments explaining the underlying ML concepts.

The framework runs end-to-end: from data preparation through training, evaluation, and deployment. All GPU-dependent components gracefully fall back to mock implementations, so you can explore the full pipeline on any machine.

Features

🔧 Configurable LoRA — Rank, alpha, dropout, and target module selection via builder pattern
📝 Data Pipeline — HuggingFace Hub loading, instruction formatting, quality filtering with deduplication, and reproducible train/test splits
🏋️ Training Pipeline — Epoch-level loss tracking with overfitting detection and convergence diagnostics
🎯 Inference Engine — Text generation with parameter validation, repetition penalty, and batch processing
📊 Evaluation Suite — Perplexity, BLEU, ROUGE, and lexical diversity metrics with unified runner
🚀 REST API — FastAPI server with single/batch/chat endpoints and request timing
📈 5-Page Dashboard — Interactive Streamlit app for training, evaluation, chat, and configuration
🧪 70+ Tests — Comprehensive test suite covering all modules with mock fallbacks

Quick Start

git clone https://github.com/mohamed-elkholy95/finetune-llm-lora.git
cd finetune-llm-lora
pip install -r requirements.txt

# Run tests (no GPU needed)
python -m pytest tests/ -v

# Launch dashboard
streamlit run streamlit_app/app.py

# Start API server
python -m src.deploy.api_server

QLoRA Training (GPU Required)

# Full training on Dolly-15k with Qwen2.5-0.5B
python train_qlora_qwen.py

# Quick test run
python train_qlora_qwen.py --dry-run

# Experiment with higher rank
python train_qlora_qwen.py --lora-rank 16 --max-samples 1000

API Endpoints

Method	Endpoint	Description
GET	`/health`	Health check
GET	`/model/info`	Model metadata and LoRA config
POST	`/generate`	Single-prompt text generation
POST	`/chat`	Chat-style message completion
POST	`/batch`	Batch generation for multiple prompts

Project Structure

See docs/ARCHITECTURE.md for the full architecture guide, including how LoRA works, data flow diagrams, and hyperparameter selection guidelines.

Testing

# Run all tests
python -m pytest tests/ -v

# Run specific module tests
python -m pytest tests/test_trainer.py -v
python -m pytest tests/test_inference.py -v
python -m pytest tests/test_diversity.py -v

Author

Mohamed Elkholy — GitHub · melkholy@techmatrix.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 LoRA Fine-Tuning LLM

Overview

Features

Quick Start

QLoRA Training (GPU Required)

API Endpoints

Project Structure

Testing

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
docs		docs
src		src
streamlit_app		streamlit_app
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🧠 LoRA Fine-Tuning LLM

Overview

Features

Quick Start

QLoRA Training (GPU Required)

API Endpoints

Project Structure

Testing

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages