GCPO

⚡ Group Contrastive Policy Optimization (GCPO)

Official repository of the paper: GCPO: When Contrast Fails, Go Gold

About

GCPO (Group Contrastive Policy Optimization) is a novel reinforcement learning algorithm designed to enhance the reasoning capabilities of language models, especially in scenarios where the model fails to generate correct responses. Unlike previous methods like GRPO, which rely solely on the model’s own rollouts, GCPO introduces Golden Answers (GAs) — external reference answers — to guide the model’s updates when all sampled responses are incorrect.

This approach ensures:

✅ Full sample utilization — no training data is wasted
🧠 Knowledge transfer — small models learn reasoning strategies from larger models
🚀 Faster convergence and better generalization

🎯 Key Features

✅ Golden Answer Injection — handles failure rollouts by injecting correct reference solutions
⚖️ Sequence-Level Importance Sampling — stabilizes training under sparse reward settings
🔥 Contrastive Optimization — enhances separation between good and bad reasoning traces
✨ No KL Penalty Needed — encourages diverse yet effective reasoning behaviors
📚 Generalizable — works on math, code, and logical QA tasks

🚀 Coming Soon

Item	Status
Paper	✅ Released
Model Checkpoints	✅ Released
GCPO Dataset	⏳ Coming soon
Code (Training + Evaluation)	⏳ Coming soon

🛠️ Model Use

We provide the model weights of GCPO-R1-1.5B, which is trained based on DeepSeek-R1-Distill-Qwen-1.5B using the GCPO algorithm. You can find the model at https://huggingface.co/Ach0/GCPO-R1-1.5B.

⚖️ Evaluation

To evaluate the model on AIME 2024, run:

python3 vllm_eval.py --model_path Ach0/GCPO-R1-1.5B --test_file dataset/AIME24/aime_2024.jsonl --output_path aime2024_result.jsonl  --tensor_parallel_size 4 --mode all

📊 GCPO Improves Reasoning Performance

GCPO consistently outperforms DAPO.

🔧 GCPO Training Pipeline

✍️ Citation

If you find this work useful, please cite:

@article{wu2025gcpo,
  title={GCPO: When Contrast Fails, Go Gold},
  author={Hao Wu and Wei Liu},
  journal={arXiv preprint arXiv:2510.07790},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
dataset		dataset
LICENSE		LICENSE
README.md		README.md
vllm_eval.py		vllm_eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GCPO

About

🎯 Key Features

🚀 Coming Soon

🛠️ Model Use

⚖️ Evaluation

📊 GCPO Improves Reasoning Performance

🔧 GCPO Training Pipeline

✍️ Citation

About

Uh oh!

Releases

Packages

Languages

License

AchoWu/GCPO

Folders and files

Latest commit

History

Repository files navigation

GCPO

About

🎯 Key Features

🚀 Coming Soon

🛠️ Model Use

⚖️ Evaluation

📊 GCPO Improves Reasoning Performance

🔧 GCPO Training Pipeline

✍️ Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages