Skip to content

DPO implementation and evaluation for EPFL CS552 Modern NLP (Spring 2025), with scripts, Hugging Face models, and LightEval-ready datasets.

Notifications You must be signed in to change notification settings

koreankiwi99/MNLP

Repository files navigation

DPO – CS552 Modern NLP (Spring 2025, EPFL)

This repository contains my personal work on Direct Preference Optimization (DPO) for the final project in Modern Natural Language Processing (CS552) at EPFL, Spring 2025.

Course

📄 FINALREPORT.pdf – Revised version of the final report (Revised independently to better reflect my contributions — shared here for archival purposes only.)

📝 proposal.pdf – Initial project proposal outlining the goals, plan, and methodology.

📈 progress_report.pdf – Mid-project progress report describing partial results and individual contributions.


🔍 Overview

The goal of this project was to implement and evaluate DPO based on Qwen3-0.6B-Base, using diverse pairwise preference data.

This repo includes:

  • DPO, pre-DPO, and SFT training scripts
  • Dataset analysis and preprocessing utilities
  • Evaluation modules (LightEval-compatible)
  • Configs and helper scripts

🤗 Hugging Face Collections


🚀 Quick Start (Training Scripts)

Install dependencies:

pip install transformers peft trl accelerate

🔧 Run SFT Training

python train_sft.py \
  --config_path <path_to_config_json>         # e.g., MNLP/config/sft_base.json \
  --sft_dataset_name <sft_dataset_name>       # e.g., koreankiwi99/mnlp_stem_curriculum

🔧 Run DPO Training

python train_dpo.py \
  --config_path <path_to_config_json>         # e.g., MNLP/config/lower_beta.json \
  --hf_dataset_name <dataset_name>            # e.g., koreankiwi99/mnlp_aggregate \
  --model_name <base_or_sft_model>            # e.g., koreankiwi99/sft_model_sft_base_mnlp_stem_balanced_plus

🔧 Run Pre-DPO Preference Model Training

python train_predpo.py \
  --config_path <path_to_config_json>         # e.g., MNLP/config/predpo_lower_beta.json \
  --hf_dataset_name <dataset_name>            # e.g., koreankiwi99/mnlp_aggregate \
  --model_name <base_or_sft_model>            # e.g., koreankiwi99/sft_model_sft_base_mnlp_stem_balanced \
  --ref_model_name <dpo_or_simpo_model>       # e.g., koreankiwi99/sft_model_sft_base_mnlp_stem_balanced_lower_beta_mnlp_aggregate

📁 Repo Structure

.
├── config/              # Training configs
├── data/                # Subsets of EPFL Dataset
├── evaluation/          # Convert public benchmark to LightEval format
├── legacy/              # Deprecated or testing code
├── preprocessing/       # Combining SFT datasets
├── check_overlap.py     # Utility for overlap analysis
├── dataset_stats.py     # Dataset statistics and diagnostics
├── train_dpo.py         # Main DPO training script
├── train_predpo.py      # Pre-DPO preference modeling
├── train_sft.py         # Supervised fine-tuning script
├── FINALREPORT.pdf
├── proposal.pdf
├── progress_report.pdf
└── README.md

👩🏻‍💻 About

Kyuhee Kim Master’s in Data Science @ EPFL 🌐 GitHub | 🤗 Hugging Face

About

DPO implementation and evaluation for EPFL CS552 Modern NLP (Spring 2025), with scripts, Hugging Face models, and LightEval-ready datasets.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages