SmoLoRA: Edge Language Model Fine-Tuning & Inference Toolkit

A lightweight, developer-friendly Python package for fine-tuning small language models using LoRA adapters and running on-device inference. Built for flexibility and rapid prototyping, SmoLoRA allows you to train, save, load, and generate text from language models with a clean, modular architecture.

🔧 Installation

Quick Install

NOTE: This path is not yet supported, please use the development setup for now.

pip install smolora

Development Setup

For developers who want to contribute or modify the code. Please review the Contributing section for guidelines, then follow these steps to set up your development environment:

# Clone the repository
git clone https://github.com/thatrandomfrenchdude/smolora.git
cd smolora

# Run the development setup script
chmod +x scripts/setup-dev.sh
./scripts/setup-dev.sh

This will create a virtual environment, install all dependencies, and set up pre-commit hooks.

🚀 Quick Start

from smolora import SmoLoRA

# Initialize the trainer
trainer = SmoLoRA(
    base_model_name="microsoft/Phi-1.5", # or any HuggingFace model
    dataset_name="yelp_review_full", # HuggingFace dataset
    text_field="text", # Field containing text data
    output_dir="./output_model" # Directory to save the fine-tuned model
)

# Fine-tune the model
trainer.train()

# Save the adapter and merge with base model
trainer.save()

# Load the merged model for inference
model, tokenizer = trainer.load_model("./output_model/final_merged")

# Generate text
prompt = "Write a review about a great coffee shop."
result = trainer.inference(prompt)
print("Generated output:", result)

📂 Custom Datasets

SmoLoRA supports multiple data formats through the smolora.dataset module. You can use HuggingFace datasets, local text files, CSV, or JSONL files for training.

You can use the prepare_dataset.py tool to convert your raw text, CSV, or JSONL data into a HuggingFace Dataset ready for fine-tuning.

Text Files

from smolora.dataset import load_text_data

# Load all .txt files from a directory
dataset = load_text_data("./text_directory/")

# Use with SmoLoRA
trainer = SmoLoRA(
    base_model_name="microsoft/Phi-1.5",
    dataset_name=dataset,  # Use the prepared dataset directly
    output_dir="./custom_model"
)

JSONL Files

from smolora.dataset import prepare_dataset

# Prepare JSONL data
dataset = prepare_dataset(
    source="data.jsonl",
    text_field="text",  # Field containing the text data
    chunk_size=100      # Optional: words per chunk
)

# Use with SmoLoRA

CSV Files

from smolora.dataset import prepare_dataset

# Prepare CSV data
dataset = prepare_dataset(
    source="data.csv",
    text_field="content",
    file_type="csv"  # Explicitly specify format
)

# Use with SmoLoRA

🛠️ Knobs and Levers

SmoLoRA Configuration

The SmoLoRA class accepts several parameters for customization:

trainer = SmoLoRA(
    base_model_name="microsoft/Phi-1.5",  # Any HuggingFace model
    dataset_name="yelp_review_full",      # HF dataset or custom Dataset object
    text_field="text",                    # Field containing text data
    output_dir="./fine_tuned_model"       # Output directory
)

LoRA Configuration

You can customize the LoRA adapter settings by modifying the peft_config after initialization:

trainer = SmoLoRA(...)
trainer.peft_config.r = 16              # Rank
trainer.peft_config.lora_alpha = 32     # Alpha scaling
trainer.peft_config.lora_dropout = 0.1  # Dropout

🧪 Testing

Run the comprehensive test suite:

# Run all tests
pytest tests/

# Run with coverage
pytest tests/ --cov=src/smolora --cov-report=html

# Run specific test categories
pytest tests/ -m unit        # Unit tests only
pytest tests/ -m integration # Integration tests only

The test suite includes:

Unit tests for core functionality
Dataset loading and preparation tests
Mock-based training pipeline tests
Integration tests with sample data

📁 Project Structure

smolora/
├── src/smolora/           # Main package source
│   ├── __init__.py        # Package initialization
│   ├── core.py            # Main SmoLoRA class
│   └── dataset.py         # Dataset handling utilities
├── examples/              # Usage examples
│   └── usage.py           # Basic usage example
├── tests/                 # Test suite
│   └── test_smolora.py    # Comprehensive tests
├── scripts/               # Development scripts
│   └── setup-dev.sh       # Development environment setup
├── docs/                  # Documentation
│   ├── api-reference.md   # API documentation
│   ├── architecture.md    # Architecture overview
│   └── ...               # Additional documentation
├── pyproject.toml         # Project configuration
├── requirements.txt       # Production dependencies
├── dev-requirements.txt   # Development dependencies
└── README.md             # This file

📚 Documentation

Comprehensive documentation is available in the docs/ directory:

User Guide: Quick start and basic usage
Customization Guide: Advanced configuration options
Developer Guide: Architecture and contribution guidelines

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SmoLoRA: Edge Language Model Fine-Tuning & Inference Toolkit

Table of Contents

🔧 Installation

Quick Install

Development Setup

🚀 Quick Start

📂 Custom Datasets

Text Files

JSONL Files

CSV Files

🛠️ Knobs and Levers

SmoLoRA Configuration

LoRA Configuration

🧪 Testing

📁 Project Structure

📚 Documentation

📄 License

About

Uh oh!

Releases 1

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
scripts		scripts
src		src
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
dev-requirements.txt		dev-requirements.txt
logo.png		logo.png
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

License

thatrandomfrenchdude/SmoLoRA

Folders and files

Latest commit

History

Repository files navigation

SmoLoRA: Edge Language Model Fine-Tuning & Inference Toolkit

Table of Contents

🔧 Installation

Quick Install

Development Setup

🚀 Quick Start

📂 Custom Datasets

Text Files

JSONL Files

CSV Files

🛠️ Knobs and Levers

SmoLoRA Configuration

LoRA Configuration

🧪 Testing

📁 Project Structure

📚 Documentation

📄 License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Uh oh!

Languages