Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
934 commits
Select commit Hold shift + click to select a range
b1cdef8
[recipe] feat: Move entropy reward to the entropy recipe (#2118)
Raf-Chen Jun 20, 2025
92f9381
[ci] test: enforce API docstring checks (#2114)
eric-haibin-lin Jun 20, 2025
c87e91b
[ci] test: inspect the type annotation of newly added code, focusing …
eric-haibin-lin Jun 20, 2025
0fd4d0f
[cfg, perf] refactor: add omega_conf_to_dataclass API, rename WorkerP…
eric-haibin-lin Jun 20, 2025
9bc360a
[worker] feat: add support for dynamic batch size of multimodal data …
wang-zerui Jun 21, 2025
76f63cf
[fsdp] refactor: set actor's strategy as default for critic and ref (…
0x404 Jun 21, 2025
9b7bb69
[BREAKING][ci] feat: add CI request channel & improve PR template (#2…
tongyx361 Jun 22, 2025
ade658f
[doc] fix: fix index rendering (#2127)
eric-haibin-lin Jun 22, 2025
dff6b96
[ray] feat: add a test to demonstrate how to perform p2p communicatio…
vermouth1992 Jun 22, 2025
c7aa5e8
[sglang] feat: Support async multi-turn rollout with simulation feedb…
kinza99 Jun 22, 2025
e67ee86
[tool] feat: Add memory limit configuration for sandbox fusion (#2105)
plutoZZZZ Jun 22, 2025
644aaa7
[sglang] feat: add multimodal input to multiturn async rollout (#2014)
nanjiangwill Jun 22, 2025
2ac410f
[fsdp] feat: support fsdp2 save hugging face model (#2138)
0x404 Jun 23, 2025
d69528f
[rollout]fix: vllm_rollout_spmd.py when return_raw_chat=True (#2156)
zyfzjsc988 Jun 23, 2025
bcda81d
debug script
haonan-li Jun 23, 2025
2a62123
[rollout] feat: Support Multi-stage Awake for SGLang (#1911)
hebiao064 Jun 23, 2025
08be380
[worker] feat: allow dist shared file-system initialization (#2154)
Cccei000 Jun 24, 2025
e1039ae
[model] feat: Add MiniCPM-o 2.6 support (#1833)
RanchiZhao Jun 24, 2025
24707f6
[model] fix: Revert "[model] feat: Add MiniCPM-o 2.6 support" (#2176)
hiyouga Jun 24, 2025
68d6251
[misc] fix: fix timer importance error in split_placement (#2169)
FightingZhen Jun 24, 2025
29857aa
feature:appending pass rate after each gradient updates
glorgao Jun 24, 2025
0b11244
7b-inst 4node script
haonan-li Jun 24, 2025
1c16fec
Merge branch 'diff_aware' of github.com:LLM360/Reasoning360 into diff…
haonan-li Jun 24, 2025
d707c21
bugfix: add modifications for the pass rate appending code
glorgao Jun 24, 2025
d406a13
Merge branch 'diff_aware' of https://github.com/LLM360/Reasoning360 i…
glorgao Jun 24, 2025
dc805c7
[ci] fix: enable e2e ppo trainer test (#2174)
hiyouga Jun 24, 2025
556a7c5
bug fix and launch script update: move the train_dataset.dataframe ch…
glorgao Jun 24, 2025
fc6ebc9
[megatron,vllm] fix: megatron vllm async rollout server (#2122)
Yangruipis Jun 25, 2025
c5d4d90
[doc] fix: Fix a typo in the profiler's document (#2141)
YangWang92 Jun 25, 2025
a922689
[Feature] Finalize the code and scripts forpriority sampling
glorgao Jun 25, 2025
411751e
[model] feat: Add MiniCPM-o 2.6 support (#2178)
hiyouga Jun 25, 2025
3b3e597
[megatron] feat: Support of dist checkpoint (#2125)
ETOgaosion Jun 25, 2025
9a4c15d
Add Logs and Evaluation: Add logs for monitoring the training such as…
glorgao Jun 25, 2025
c14fdf8
[Fix] switch the training to 7b-dist model, which is stronger than th…
glorgao Jun 25, 2025
02549d9
[data] fix: fix the type of parquet_files in SFTDataset (#2203)
xuuHuang Jun 26, 2025
43a5ab3
[trainer] fix: add missing qwen2_moe flops counter (#2190)
ETOgaosion Jun 26, 2025
85bacb1
[trainer] fix: Add __init__.py to verl.trainer.config (#2214)
ultmaster Jun 26, 2025
a9e3a8f
[model] fix: make vlm patch forward compatible (#2215)
hiyouga Jun 26, 2025
4f1ece8
[recipe] fix: parameter order in RayPRIMETrainer super().__init__() c…
xxnpark Jun 26, 2025
7b82407
[misc] feat: support ValidationGenerationsLogger in vemlp_wandb (#2191)
chenhaiq Jun 26, 2025
ff750e2
[trainer] fix: indentation error leading to critic_output.get() failu…
xingyunjohn1 Jun 26, 2025
ed0f308
[ckpt] fix: conditionally import fsdp/megatron backend (#2224)
jvmncs Jun 26, 2025
b2235f0
[recipe] fix: unsupported operand type(s) for |: 'dict' and 'DictConf…
xichengpro Jun 27, 2025
790a8a2
[rollout] feat: add agent loop (#2124)
wuxibin89 Jun 27, 2025
466ef1a
[misc] fix: add license (#2230)
vermouth1992 Jun 27, 2025
d8ecba3
[ckpt] feat: support esi execution environment (#2192)
plutoZZZZ Jun 27, 2025
b816d17
[sglang] feat: Add multi-interaction registry support and testing (#2…
SwordFaith Jun 27, 2025
e96f0fb
[model] fix: separate minicpmo data (#2212)
hiyouga Jun 27, 2025
8ba2f27
[misc] chore: pin transformers under 4.53 (#2241)
hiyouga Jun 27, 2025
ce6a7b8
[rollout] fix: use flashattn3 backend in sglang to avoid error in too…
chenhaiq Jun 28, 2025
a306434
[doc] chore: version bumped to v0.4.1.dev and doc fixes (#2226)
eric-haibin-lin Jun 28, 2025
fda87b8
[worker] fix: OOM on first iteration in multi-turn RL (#2253)
zTonyZhao Jun 28, 2025
bd1be62
[ci] fix: fix cpu dataset git download error (#2256)
eric-haibin-lin Jun 28, 2025
7559a6a
[doc] fix: add time info for each doc, assert sphinx warning in CI (#…
eric-haibin-lin Jun 29, 2025
072725c
[trainer, recipe] feat: add support for external generative reward mo…
yyDing1 Jun 29, 2025
afee3ac
[rollout] fix: Make `free_cache_engine` option workable in latest vLL…
HollowMan6 Jun 29, 2025
86ef66e
[trainer] fix: fix split placement (#2227)
vermouth1992 Jun 29, 2025
2805ce9
[doc, ci] fix: fix sandbox doc and enhance CI trigger filter and doc …
eric-haibin-lin Jun 30, 2025
72429f2
[rollout] feat: add zeromq vllm distributed executor (#2246)
wuxibin89 Jun 30, 2025
52065c6
[BREAKING][rollout] refactor: drop vllm v0.5.4 and v0.6.3 support (#2…
eric-haibin-lin Jun 30, 2025
7ac0d98
[trainer, vllm] feat: add lora exclude_modules to support VL model lo…
Cccei000 Jun 30, 2025
6d9ac2f
[algo] fix: correctly aggregate kl metrics in PPO actor (#2259)
0x404 Jun 30, 2025
8b33abd
[megatron] feat: add megatron memory log (#2272)
ETOgaosion Jun 30, 2025
024a8b8
[ckpt, doc] chore: add backward compatibility for model merger and sy…
0x404 Jun 30, 2025
2136049
diff aware length control
haonan-li Jun 30, 2025
0508af2
[doc] feat: more resources (#2284)
tongyx361 Jun 30, 2025
00a10a8
[ci] refactor: reduce ruff line-length from 300 to 120 (#2287)
eric-haibin-lin Jul 1, 2025
b669015
[doc] chore: add contribution guide (#2290)
eric-haibin-lin Jul 1, 2025
ba026ef
[ci, doc] fix: fix transformers version dependency on Ascend NPU (#2291)
FightingZhen Jul 1, 2025
211984b
[doc] fix: Update ascend_quick_start.rst (#2293)
vermouth1992 Jul 1, 2025
becdb56
[CI] fix: replace private model in CI test (#2295)
yyDing1 Jul 1, 2025
be87b76
add easy&hard drop ratio and val length control
haonan-li Jul 1, 2025
82d1ef5
[sglang] feat: Repeat sampling parameter n into requests of GRPO in S…
zhaochenyang20 Jul 2, 2025
1bdf4d2
[hardware, recipe, ci] feat: Support fsdp peft sft on npu (#2240)
zheliuyu Jul 2, 2025
2a25e31
[doc] feat: FSDP forward prefetch and entropy memory optimizations (#…
CurryRice233 Jul 2, 2025
29f50e7
[recipe] feat: add retool recipe (#2233)
wuxibin89 Jul 2, 2025
1a4b977
[cfg] fix: Security Enhancement Block Dangerous Modules in Sandbox En…
none0663 Jul 2, 2025
4a846aa
[hardward] chore: Enable Generation of Wheel File During Docker Build…
rhiremat Jul 2, 2025
0ea96a2
[cfg] chore: add non-negative expected_len assertion (#2330)
LeavesLei Jul 3, 2025
433544f
[megatron] feat: use mbridge as megatron adaptor (#2064)
ISEEKYAN Jul 3, 2025
bc2cc6b
[rollout] feat: Allow customization of async server class (#2326)
ultmaster Jul 3, 2025
4546155
[Fix] clipping ratio bug fix
glorgao Jul 3, 2025
e3174b8
[Fix] clipping ratio bug fix
glorgao Jul 3, 2025
7db7f32
[megatron, fsdp, doc] feat: implement GPG loss. Add GPG advantage est…
diqiuzhuanzhuan Jul 3, 2025
11ee512
[ci] chore: add gemini code assistant config (#2349)
eric-haibin-lin Jul 3, 2025
0332866
[algo] feat: mask out observation token in GAE (#2337)
wuxibin89 Jul 4, 2025
aba2684
[tool] fix: avoid exception when sandbox return None (#2346)
chenhaiq Jul 4, 2025
212d814
[perf] feat: support entropy checkpointing without rmpad or sp (#2342)
FightingZhen Jul 4, 2025
18c6ffc
[megatron] fix: optimizer scheduler misalignment with FSDP (#2303)
ETOgaosion Jul 4, 2025
a53fb30
[ckpt] fix: edit esi doc (#2354)
plutoZZZZ Jul 4, 2025
ebb21b7
[docker] refactor: Migrate images to verlai, support latest flash att…
ETOgaosion Jul 4, 2025
0d2af47
[rollout] fix: #1646 stop words for sglang rollout (#1991)
linxxx3 Jul 4, 2025
8883b29
[trainer] fix: pre-commit broken by #2354 (#2358)
ETOgaosion Jul 4, 2025
5c39b51
[hardware] feat: support ray actor sharing situation on ASCEND NPU (#…
FightingZhen Jul 4, 2025
c936ec7
[trainer, cfg] feat: add BaseConfig for all dataclass configs. Introd…
eric-haibin-lin Jul 4, 2025
dbd4ff1
[data] feat: add interface for user-defined curriculum sampler (#2314)
frrad Jul 4, 2025
0c8fbf2
instance level length control
haonan-li Jul 4, 2025
1b891dc
[cfg] fix: pickleing error in multiprocessing in the reward_fn (#2239)
none0663 Jul 4, 2025
715724c
[tool] feat: Add support for tools that generate multimodal data (#2146)
nanjiangwill Jul 4, 2025
9b0e327
[doc] fix: add show source option (#2370)
eric-haibin-lin Jul 5, 2025
9cc3077
[ray] refactor: Seperate the constants into different file (#2025)
YeonwooSung Jul 5, 2025
50ba712
[misc] fix: invalid escape sequence '\*' (#2375)
HollowMan6 Jul 5, 2025
cbeb3f4
[rollout] fix: fix hf rollout and add single gpu test (#2371)
eric-haibin-lin Jul 5, 2025
e9b38dc
Revert "[misc] fix: invalid escape sequence '\*'" (#2376)
vermouth1992 Jul 5, 2025
281ecd4
[doc] fix: Fix document config.rst (#2369)
shuyhere Jul 5, 2025
c71fa39
[doc] feat: add July events (#2382)
eric-haibin-lin Jul 6, 2025
2a01b21
[ci] fix: PR title check supports module names with underscore (`trai…
HollowMan6 Jul 6, 2025
1f9e475
per_instance pass_rate, max_length, etc
haonan-li Jul 6, 2025
891c873
[sglang, rollout] refactor: use torch.Tensor in async rollout schemas…
nanjiangwill Jul 6, 2025
5cbad83
[trainer] fix: Use safe masked mean/sum to handle NaN values outside …
Yangruipis Jul 6, 2025
4c37c97
[rollout] fix: sglang async fail with Multi-stage Awake feature (#2365)
chenhaiq Jul 7, 2025
fc35956
[BREAKING][rollout] feat: repeat DataProto when n>1 in driver instead…
wuxibin89 Jul 7, 2025
26d3a03
[misc] refactor: replace pkg_resources with importlib.metadata (#2392)
askender Jul 7, 2025
26e26d1
[sglang, rollout, doc] fix: update sglang rollout generate doc (#2385)
Tavish9 Jul 7, 2025
cb3dcc6
[ci] feat: use action (#2393)
plutoZZZZ Jul 7, 2025
926ecb7
remove priority_sampling scripts
haonan-li Jul 7, 2025
3ad12f9
add resume function
haonan-li Jul 7, 2025
1e7c545
[tool] fix: Add MCP usage documentation (#2261)
AlecHenx Jul 7, 2025
3f929af
[cfg] refactor: make actor config more modular (#2379)
eric-haibin-lin Jul 7, 2025
ee65422
[sglang] fix: only wake up weights on infer_tp 0 (#2403)
zhaochenyang20 Jul 7, 2025
578501e
[sglang] fix: Import Error in the latest sglang (#2275)
yyDing1 Jul 8, 2025
ec4433c
[misc] feat: trace rollout generation and tool calls using weave (#2345)
chenhaiq Jul 8, 2025
588f972
[ci] fix: forbid ci on forks (#2412)
plutoZZZZ Jul 8, 2025
a4033af
[ci] feat: add docstring checker script and comprehensive docstrings …
eric-haibin-lin Jul 8, 2025
004da73
[rollout] fix: huggingface model config max_position_embeddings asser…
Wangmerlyn Jul 9, 2025
4def91d
[data] refactor: move sampler api to experimental (#2381)
eric-haibin-lin Jul 9, 2025
ad33564
[sglang] fix: Bug in megatron+sglang TP16 update_weights. (#2336)
SuperCB Jul 9, 2025
cccc2ef
[cfg] refactor: make the rollout & ref configs more modular (#2410)
eric-haibin-lin Jul 9, 2025
a5004c2
debug vary_length
haonan-li Jul 9, 2025
b5e711e
[perf] feat: add npu profiler for FSDP backend (#2194)
tongtong0613 Jul 9, 2025
526098d
[Hardware] feat: Support AMD (ROCMm Kernel) - Update Dockerfile/Docke…
yushengsu-thu Jul 9, 2025
b3aed0d
[sglang] fix: Fix qwen2vl weight keys issue (#2434)
hebiao064 Jul 9, 2025
ab11fff
[trainer, data] feat: Dynamic Data Generation (#2312)
jwong8314 Jul 9, 2025
fc8acdc
[cfg] refactor: split fsdp/megatron specific configs, consolidate sha…
eric-haibin-lin Jul 10, 2025
ce4273c
debug dataloader
haonan-li Jul 10, 2025
c26b0f2
[misc] refactor: Replace deepcopy with tensor.clone (#2442)
ji-huazhong Jul 10, 2025
7b52366
[hardware] fix: enable sleep mode on ASCEND NPU (#2459)
sunyi0505 Jul 10, 2025
269bb4a
[doc] chore: add ICML meetup and upcoming feat (#2431)
eric-haibin-lin Jul 10, 2025
1f3f0a5
[misc] fix: add *.yaml to pyproject due to modular config (#2468)
nanjiangwill Jul 10, 2025
a8d9d25
[misc] feat: add py.typed file to `verl/` (#2467)
frrad Jul 10, 2025
de38ed4
[env] feat: upgrade tensordict version (#2460)
vermouth1992 Jul 10, 2025
01624c6
[doc] fix: colocation documentation updates (#2465)
eric-haibin-lin Jul 11, 2025
49fe461
[doc] chore: add documentation for truncation: middle option (#2462)
Wangmerlyn Jul 11, 2025
1dfc135
[perf] feat: add range tag to start/stop profile; clean actor_rollout…
davidmlw Jul 11, 2025
ada82bb
[doc] feat: update documentation of nsight profiling (#2470)
davidmlw Jul 11, 2025
d9a6a31
[megatron] feat: fused kernel lightweight (#2210)
ISEEKYAN Jul 11, 2025
c3e953c
[docker] feat: provide images with deepep (#2480)
ETOgaosion Jul 11, 2025
590a62a
[training_utils] feat: log_generations_to_swanlab use table (#2489)
Zeyi-Lin Jul 12, 2025
f0b4aba
[fsdp] fix: Change the data in the update_actor function from to.('cp…
Keilo001 Jul 12, 2025
75f2abf
[sglang] fix: Only flush cache on TP rank=0. (#2455)
SuperCB Jul 12, 2025
6519220
[trainer] fix: use .keys() to check 'response_mask' in TensorDict (#2…
askender Jul 12, 2025
1aa3bcd
ifbench
Jul 12, 2025
f180dfc
ifbench training shell
Jul 12, 2025
eac4863
[env] feat: safely bump py version to 3.10 (#2421)
Tavish9 Jul 12, 2025
4aa02fe
[trainer] fix: Allow FSDP2 when doing strategy check (#2497)
HollowMan6 Jul 12, 2025
8e0b9bd
[recipe] chore: Remove the duplicate definition of `class Role` (#2503)
kevin85421 Jul 13, 2025
11e0cf7
[misc] refactor: remove deprecated codes (#2494)
ji-huazhong Jul 13, 2025
92758d6
[env] fix: Change the permissions of `install_vllm_sglang_mcore.sh` f…
kevin85421 Jul 13, 2025
4d0f4d0
[doc] feat: update npu profiler doc and script (#2514)
tongtong0613 Jul 14, 2025
a31a8f2
[doc] fix: quickstart example can't work on zsh (#2509)
kevin85421 Jul 14, 2025
fbec86d
[BUG] fix bug for #2506, when passing as response_mask to policy_loss…
none0663 Jul 14, 2025
0b508ab
[single_controller] fix: replace unittest.mock.patch with context man…
PeterSH6 Jul 14, 2025
141b1d3
[recipe] fix: DAPO rewards using sandbox fusion (#2496)
HollowMan6 Jul 14, 2025
def5b28
[rollout] feat: support mlflow in rollout trace (#2440)
chenhaiq Jul 14, 2025
d0c7bbb
[cfg] refactor: support +extra.any_key usage for the base dataclass c…
eric-haibin-lin Jul 15, 2025
53ec813
[ray] refactor: Use public method to get node IP (#2521)
kevin85421 Jul 15, 2025
517cc23
[megatron] feat: allow override DistributedDataParallelConfig (#2523)
ETOgaosion Jul 15, 2025
2c407f2
[cfg] fix: fix _generated_ppo_trainer.yaml pre-commit error on main (…
eric-haibin-lin Jul 15, 2025
83d6a80
[fsdp] fix: vlm dynamic batch & unify dynamic batch api (#2524)
hiyouga Jul 15, 2025
bbd1288
[data, megatron] feat: add dynamic batching computational workload ba…
conver334 Jul 15, 2025
473d8ff
[env] fix: bump tensordict to 0.9.1 (#2541)
ultmaster Jul 15, 2025
10f4eb8
[misc] chore: fix typo in function name (#2525)
ShareLer Jul 15, 2025
2dea259
[data] fix: Add missing init files in verl experimental data folders …
JoostvDoorn Jul 15, 2025
2c0ae78
[ray] fix: strip [] for ipv6 address (#2545)
wuxibin89 Jul 15, 2025
ab18551
update effrl
haonan-li Jul 15, 2025
166d91a
[trainer] refactor: minor code cleanup (#2537)
eric-haibin-lin Jul 15, 2025
a63243b
[fsdp] fix: change geo3k model name from non-vl to vl (#2555)
nanjiangwill Jul 15, 2025
1fe5daf
[sglang, megatron, perf] feat: speed up megatron sglang weight update…
Yangruipis Jul 15, 2025
f0d4c76
[sglang] feat: update weights in batch with FSDP (#2559)
zhaochenyang20 Jul 15, 2025
2182987
[ci] chore: add single-controller reviewer (#2554)
eric-haibin-lin Jul 16, 2025
5f687b2
[sglang] fix: adding missing param for sgl async unit test (#2561)
zhaochenyang20 Jul 16, 2025
3f07732
[tool] fix: correctly convert 'None' to null in sandbox fusion _proce…
mathewjhan Jul 16, 2025
e300d0f
[doc] feat: add document for agentic RL related features (#2563)
chenhaiq Jul 16, 2025
1a89141
[training_utils] fix: uneven support in split (#2560)
ultmaster Jul 16, 2025
6e21c0a
[megatron] feat: support distributed megatron model converter and mer…
Yangruipis Jul 16, 2025
7aabfc4
[rollout] feat: add ReactAgentLoop based on LangGraph (#2463)
wuxibin89 Jul 16, 2025
152c599
[perf] feat: Clip gsm8k solution string to optimize reward calculatio…
PopSoda2002 Jul 16, 2025
da2ab08
[doc] fix: correct link in agentic RL doc (#2567)
chenhaiq Jul 16, 2025
96b730b
[megatron] fix: wrong response_mask for megatron + sglang mutli-turn …
Yangruipis Jul 16, 2025
3f63715
[doc] fix: fix non-existing tag of base image in docs (#2569)
rudeigerc Jul 16, 2025
8a787b0
debug
haonan-li Jul 16, 2025
f0964b6
[rollout] fix: fix bug for remax when the rollout mode is async (#2574)
none0663 Jul 16, 2025
9cc3199
Merge remote-tracking branch 'official/main' into pr-upstream-verl
ZYHowell Jul 17, 2025
c61e228
remove redundant workflow files
ZYHowell Jul 17, 2025
17cf8be
align with workflow format
ZYHowell Jul 17, 2025
a82b207
remove redundant files and add missing files
ZYHowell Jul 17, 2025
740faec
remove redundant code and fix formatting nuances
ZYHowell Jul 17, 2025
3325749
final script
haonan-li Jul 17, 2025
a3527eb
Add missing []
BlankCheng Jul 17, 2025
b37290c
Remove arg for vllm generation to be compatible
BlankCheng Jul 21, 2025
d6cd345
Revery to dataframe to support multiple domains
BlankCheng Jul 22, 2025
3051df1
Rename dapo reward manager to async_mp
BlankCheng Jul 22, 2025
6d85b21
Change deprecatedfunctions
BlankCheng Jul 25, 2025
810e04d
Remove deprecated dapo recipe files. Now please switch to ./recipe/dapo
BlankCheng Jul 25, 2025
fb98f28
add reasoning_gym feature
Jul 28, 2025
cbb760d
delete reasoning_gym/test
Jul 28, 2025
6e4c71e
merge init.py
Jul 28, 2025
38f1600
Merge branch 'main' into reasoning_gym
Jianshu-She Jul 28, 2025
cedad60
update scripts
haonan-li Jul 31, 2025
fdf78b2
merge
haonan-li Jul 31, 2025
cfc321d
Merge remote-tracking branch 'origin/reasoning_gym' into diff_aware
haonan-li Jul 31, 2025
96a5131
update script
haonan-li Aug 5, 2025
0095c1d
Update example traiin scripts for latest verl (fsdp and megatron).
BlankCheng Aug 6, 2025
d0e2c1b
Update scripts
BlankCheng Aug 7, 2025
9b96bc2
update pass_rate and stats script
haonan-li Aug 8, 2025
2be6cdc
local_mkdir missing from previous version of Reasoning360
nightlessbaron Aug 12, 2025
e9975f4
Upload performance tuning guide; Upload fsdp and megatron training e…
BlankCheng Aug 12, 2025
0e9d7f0
Upload performance tuning guide; Upload fsdp and megatron training e…
BlankCheng Aug 12, 2025
01c4492
[data] Add IFBench dataset (#113)
Jianshu-She Jul 25, 2025
0fe16df
Reasoning gym (#119)
Jianshu-She Aug 6, 2025
3a830ea
add synlogic
LiqunMa Aug 12, 2025
fe2e413
del
LiqunMa Aug 12, 2025
9bcf1d4
add all synlogic verifier
LiqunMa Aug 12, 2025
de34d41
add train scripts
LiqunMa Aug 12, 2025
c85d51e
Upload 70b long-cot fsdp example [tested]
BlankCheng Aug 14, 2025
b0b6347
del the code for debug
LiqunMa Aug 14, 2025
0450b70
nemotron stem
Aug 15, 2025
c9e7b3e
init dapo
haonan-li Aug 21, 2025
1e5e5cd
add quantile statistics and address the data type warning on logger
glorgao Aug 21, 2025
0494be5
[Bug fix and improve] Avoid the seoncd time repeating for vllm rollou…
glorgao Aug 21, 2025
ef4a8ce
correct dataset name
LiqunMa Aug 27, 2025
d0cfea9
fix names of test data
LiqunMa Aug 28, 2025
924d94f
merge syn
LiqunMa Aug 28, 2025
7f998e0
merge nemotron_stem
LiqunMa Aug 28, 2025
ab00656
fix the data dir
LiqunMa Aug 28, 2025
e13c7ab
fix bugs
LiqunMa Sep 1, 2025
52f9c09
small
haonan-li Sep 1, 2025
4ee85d4
Merge branch 'pr-upstream-verl-merge-diffaware' of github.com:LLM360/…
haonan-li Sep 1, 2025
ae4a18a
dalu recipe
haonan-li Sep 26, 2025
adf641b
update data_process
haonan-li Sep 28, 2025
249d242
revise testset
haonan-li Sep 28, 2025
b5daf34
[1]Fix generation length to 32k and [2]logging for responses length
glorgao Sep 29, 2025
c84c707
m1 script & better dalu data shuffle
Sep 29, 2025
a4ceab1
update ifbench test
Sep 29, 2025
7c02165
debug length
Sep 30, 2025
863cb25
Merge branch 'pr-upstream-verl-merge-diffaware' of github.com:LLM360/…
haonan-li Sep 30, 2025
1f98532
support wandb resume by adding +trainer.run_id={RUNIT} in recipe
haonan-li Sep 30, 2025
a630dab
update 32b training script
haonan-li Oct 1, 2025
c25c64d
update data_process script
haonan-li Oct 6, 2025
1afe759
update scripts
haonan-li Oct 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
10 changes: 10 additions & 0 deletions .gemini/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
have_fun: false
code_review:
disable: false
comment_severity_threshold: HIGH
max_review_comments: -1
pull_request_opened:
help: false
summary: false
code_review: true
ignore_patterns: []
18 changes: 18 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
/docs @eric-haibin-lin @zhaochenyang20 @hongpeng-guo
/docs/amd_tutorial @yushengsu-thu
/docs/slang_multiturn @zhaochenyang20 @SwordFaith

/recipe/dapo @tongyx361 @PeterSH6
/recipe/spin @zhaochenyang20
/recipe/sppo @zhaochenyang20

/third_party/sglang @zhaochenyang20 @SwordFaith
/third_party/vllm @PeterSH6 @wuxibin89
/verl/single_controller @zw0610 @wuxibin89 @hongpeng-guo
/verl/trainer @eric-haibin-lin @vermouth1992 @tongyx361 @PeterSH6
/verl/workers/rollout/vllm_rollout @wuxibin89 @PeterSH6 @chenhaiq
/verl/workers/rollout/sglang_rollout @zhaochenyang20 @SwordFaith @chenhaiq

/tests/single_controller @zw0610 @wuxibin89
/tests/trainer @eric-haibin-lin @vermouth1992 @tongyx361 @PeterSH6
/tests/workers/rollout/vllm_rollout @wuxibin89 @PeterSH6 @chenhaiq
54 changes: 24 additions & 30 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -1,46 +1,40 @@
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish.
> Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes
### Checklist Before Starting

> List the specific changes.
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### API
### Test

> Demonstrate how the API changes if any.
> For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

### Usage Example
### API and Usage Example

> Provide usage example(s) for easier usage.
> Demonstrate how the API changes if any, and provide usage example(s) if possible.

```python
# Add code snippet or script demonstrating how to use this
# Add code snippet or script demonstrating how to use this
```

### Test

> For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc.
### Design & Code Changes

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none]
> Demonstrate the high-level design if this PR is complex, and list the specific changes.

### Checklist Before Submitting

- [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
> [!IMPORTANT]
> Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always`
- [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
69 changes: 69 additions & 0 deletions .github/workflows/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
### Adding a New Workflow

When adding a new workflow for continuous integration (CI), you have two runner options: a fixed runner or a machine from the vemlp.

- **Fixed Runner**: To use a fixed runner, specify it in your workflow using the `runs-on` keyword, like `runs-on: [L20x8]`.
- **Vemlp Runner**: Opting for a Vemlp machine allows you to launch tasks elastically.

Here is a template to assist you. This template is designed for using Vemlp machines. Currently, for each workflow, you need to create a `setup` and a `cleanup` job. When using this template, the main parts you need to modify are the `IMAGE` environment variable and the specific `job steps`.

```yaml
name: Your Default Workflow

on:
push:
branches:
- main
- v0.*
pull_request:
branches:
- main
- v0.*
paths:
- "**/*.py"
- ".github/workflows/template.yml"

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

permissions:
contents: read

env:
IMAGE: "your vemlp image" # e.g. "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.4-vllm0.8.5-mcore0.12.1"
DYNAMIC_RUNNER_URL: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner" # public veFaas api

jobs:
setup:
if: github.repository_owner == 'volcengine'
runs-on: ubuntu-latest
outputs:
runner-label: ${{ steps.create-runner.outputs.runner-label }}
task-id: ${{ steps.create-runner.outputs.task-id }}
steps:
- uses: actions/checkout@v4
- id: create-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "create"
faas-url: "${{ env.DYNAMIC_RUNNER_URL }}"
image: "${{ env.DEFAULT_IMAGE }}"

your_job:
needs: setup
runs-on: ["${{ needs.setup.outputs.runner-label || 'default-runner' }}"]
steps:
xxxx # your jobs

cleanup:
runs-on: ubuntu-latest
needs: [setup, your_job]
if: always()
steps:
- id: destroy-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "destroy"
faas-url: "${{ env.DYNAMIC_RUNNER_URL }}"
task-id: "${{ needs.setup.outputs.task-id }}"
58 changes: 58 additions & 0 deletions .github/workflows/check-pr-title.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
# - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
# - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
# - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
# - new workflow yaml is added to `.github/workflows`
# - new tests are added to workflow mentioned in 2.


on:
pull_request:
types: [opened, edited, synchronize]

jobs:
check-title:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'

- name: Run PR title checker
run: python3 tests/special_sanity/check_pr_title.py
env:
PR_TITLE: ${{ github.event.pull_request.title }}

- name: Run PR description checker
run: python3 tests/special_sanity/check_pr_description.py
env:
PR_TITLE: ${{ github.event.pull_request.title }}
GITHUB_EVENT_PATH: ${{ github.event_path }}
47 changes: 42 additions & 5 deletions .github/workflows/checkpoint_converter.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,36 @@
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
# - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
# - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
# - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
# - new workflow yaml is added to `.github/workflows`
# - new tests are added to workflow mentioned in 2.



name: checkpoint_converter
# latest version: Megatron-LM core_r0.11.0 https://github.com/NVIDIA/Megatron-LM/tree/core_r0.11.0

Expand Down Expand Up @@ -27,7 +60,7 @@ on:
- ".github/workflows/checkpoint_converter.yml"
- ".github/workflows/e2e_ppo_trainer_megatron.yml"
- "examples/data_preprocess/gsm8k.py"
- "tests/e2e/run_ppo_trainer_megatron.sh"
- "tests/special_e2e/run_ppo_trainer_megatron.sh"
- "verl/trainer/main_ppo.py"
- "verl/trainer/config/ppo_megatron_trainer.yaml"

Expand All @@ -51,7 +84,7 @@ jobs:
NO_PROXY: "localhost,127.0.0.1"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
container:
image: whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6-mcore0.12.0-te2.3
image: verlai/verl:app-verl0.4-sglang0.4.6.post5-vllm0.8.5-mcore0.12.1
options: --gpus all --shm-size=10g
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
Expand All @@ -63,11 +96,11 @@ jobs:
- name: Running Huggingface to Megatron dist_ckpt converter (Qwen/Qwen2.5-0.5B)
run: |
ray stop --force
python scripts/converter_hf_to_mcore.py --hf_model_path=${HOME}/models/Qwen/Qwen2.5-0.5B --output_path checkpoints/Qwen/Qwen2.5-0.5B
python scripts/converter_hf_to_mcore.py --hf_model_path=${HOME}/models/Qwen/Qwen2.5-0.5B --output_path checkpoints/Qwen/Qwen2.5-0.5B --test
- name: Running Huggingface to Megatron dist_ckpt converter (deepseek-ai/deepseek-coder-1.3b-instruct)
run: |
ray stop --force
python scripts/converter_hf_to_mcore.py --hf_model_path=${HOME}/models/deepseek-ai/deepseek-coder-1.3b-instruct --output_path checkpoints/deepseek-ai/deepseek-coder-1.3b-instruct
python scripts/converter_hf_to_mcore.py --hf_model_path=${HOME}/models/deepseek-ai/deepseek-coder-1.3b-instruct --output_path checkpoints/deepseek-ai/deepseek-coder-1.3b-instruct --test
- name: Clean up
run: |
rm -rf checkpoints
Expand All @@ -81,7 +114,7 @@ jobs:
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
HF_ENDPOINT: "https://hf-mirror.com"
container:
image: whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6-mcore0.12.0-te2.3
image: verlai/verl:app-verl0.4-sglang0.4.6.post5-vllm0.8.5-mcore0.12.1
options: --gpus all --shm-size=10g
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
Expand All @@ -98,6 +131,10 @@ jobs:
run: |
ray stop --force
python scripts/converter_hf_to_mcore.py --hf_model_path=${HOME}/models/Qwen/Qwen1.5-MoE-A2.7B-Chat --output_path checkpoints/Qwen/Qwen1.5-MoE-A2.7B-Chat --use_cpu_initialization
- name: Running distributed Huggingface to Megatron dist_ckpt CPU converter (Qwen/Qwen1.5-MoE-A2.7B-Chat)
run: |
ray stop --force
torchrun --nproc_per_node 8 --nnodes 1 scripts/converter_hf_to_mcore.py --hf_model_path=${HOME}/models/Qwen/Qwen1.5-MoE-A2.7B-Chat --output_path checkpoints/Qwen/Qwen1.5-MoE-A2.7B-Chat_dist --use_cpu_initialization
- name: clean up
run: |
rm -rf checkpoints
Loading