“This project demonstrates a minimal RLHF loop for aligning language models with human preferences. It includes supervised fine-tuning, preference data collection, reward model training, and policy optimization using PPO or DPO. Designed for clarity, reproducibility, and scalability.”

kantkrishan0206-crypto/AlignGPT
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|