Skip to content

kxfan2002/R1-Collection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

A repo of R1-like reproductions

Official repo by Deepseek-ai

Reproduction of R1 by huggingface

Reprodction of R1-Zero by Jiayi-Pan

R1 in VLMs.

  • Item counting & GeoQA.

Reproduction of R1-Zero and R1 by small models and limited data.

Reproduction of R1-Zero.

  • Logic puzzle

R1 paradigm in multimodal model.

  • Math reasoning (data created by gpt-4o, based on Math360k & Geo170k)

R1-style LVLM. Referring Expression Comprehension(REC).

  • RefCOCO for in domain and RefGTA for OOD.

R1(-Zero) methods for training agentic models.

Reproduction of R1 on multimodal setting. Based on OpenRLHF. Currently support PPO/REINFORCE++/RLOO training for LMM.

  • MATH dataset.

Reproduction of Deepseek R1-Zero

  • 57k curated training data

Based on veRL, supporting VLM RL

About

A collection of R1-based repos.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published