Repository files navigation # GRPO: Guided Reinforcement Policy Optimization for LLM Fine-tuning
A comprehensive guide and toolkit for fine-tuning language models using reinforcement learning techniques on the Hanzo AI platform.
## Overview
GRPO (Guided Reinforcement Policy Optimization) enables you to transform general language models into domain-specific experts using custom reward signals and reinforcement learning. This repository provides both the original implementation and Hanzo AI cloud platform integration.
## Quick Start
### Using Hanzo AI Cloud Platform
```bash
# Install Hanzo CLI
pip install hanzoai-cli
# Login to Hanzo Cloud
hanzo auth login
# Create a new GRPO project
hanzo ml create grpo-project --type=reinforcement-learning
# Deploy your dataset
hanzo ml dataset upload --file=skippy_knowledge_base.csv --project=grpo-project
# Start fine-tuning
hanzo ml train --config=config/grpo_config.yaml --project=grpo-project
```
### Local Development
```bash
# Clone the repository
git clone https://github.com/hanzoai/grpo
cd grpo
# Install dependencies
pip install -r requirements.txt
# Prepare your dataset
python scripts/prepare_dataset.py --input=data/raw --output=data/processed
# Run training
python src/train.py --config=config/local_config.yaml
```
## Features
- **Custom Reward Functions**: Define domain-specific reward signals
- **Parameter-Efficient Fine-Tuning**: Support for LoRA and QLoRA
- **Multi-Model Support**: Compatible with Zen and other transformer models
- **Hanzo AI Integration**: Seamless deployment on Hanzo cloud infrastructure
- **Kubeflow ML Platform**: Enterprise-grade ML operations support
## Documentation
- [Complete Guide](docs/guide.md) - Detailed walkthrough of GRPO concepts
- [Hanzo Integration](docs/hanzo_integration.md) - Using GRPO with Hanzo AI platform
- [API Reference](docs/api_reference.md) - Complete API documentation
- [Examples](examples/) - Sample implementations and use cases
## Requirements
### Core Libraries
```python
datasets>=2.14.0
transformers>=4.35.0
trl>=0.7.0
torch>=2.0.0
peft>=0.6.0
accelerate>=0.24.0
```
### Hanzo AI Platform Libraries
```python
hanzoai>=0.1.0
hanzoai-ml>=0.1.0
hanzoai-cli>=0.1.0
```
## Repository Structure
```
grpo/
� README.md # This file
� requirements.txt # Python dependencies
� setup.py # Package setup
� config/ # Configuration files
� � grpo_config.yaml # Default GRPO configuration
� � kubeflow/ # Kubeflow pipeline configs
� src/ # Source code
� � grpo/ # Core GRPO implementation
� � hanzo/ # Hanzo AI integrations
� � utils/ # Utility functions
� scripts/ # Helper scripts
� � prepare_dataset.py
� � deploy_to_hanzo.py
� examples/ # Example implementations
� � skippy/ # Skippy platform example
� � custom_rewards/ # Custom reward functions
� docs/ # Documentation
� � guide.md # Complete GRPO guide
� � hanzo_integration.md
� � api_reference.md
� tests/ # Unit tests
```
## License
MIT License - see [LICENSE](LICENSE) file for details.
## Contributing
Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines on how to contribute to this project.
## Support
- [Documentation](https://docs.hanzo.ai/grpo )
- [GitHub Issues](https://github.com/hanzoai/grpo/issues )
- [Hanzo AI Community](https://community.hanzo.ai )
About
GRPO: Guided Reinforcement Policy Optimization for LLM fine-tuning
Resources
License
Stars
Watchers
Forks
You can’t perform that action at this time.