Official repo by Deepseek-ai
Reproduction of R1 by huggingface
Reprodction of R1-Zero by Jiayi-Pan
R1 in VLMs.
- Item counting & GeoQA.
Reproduction of R1-Zero and R1 by small models and limited data.
Reproduction of R1-Zero.
- Logic puzzle
R1 paradigm in multimodal model.
- Math reasoning (data created by gpt-4o, based on Math360k & Geo170k)
R1-style LVLM. Referring Expression Comprehension(REC).
- RefCOCO for in domain and RefGTA for OOD.
R1(-Zero) methods for training agentic models.
Reproduction of R1 on multimodal setting. Based on OpenRLHF. Currently support PPO/REINFORCE++/RLOO training for LMM.
- MATH dataset.
Reproduction of Deepseek R1-Zero
- 57k curated training data
Based on veRL, supporting VLM RL