Releases: ByteDance-Seed/VeOmni
Releases · ByteDance-Seed/VeOmni
v0.1.4
Highlights
- We now have GPU and NPU in CI and enable some tests on them. We call for more contribution of tests if you are interested!
- Now VeOmni model/dataset/dataloader/checkpointer/chat_template/preprocess are registry-based, making adding new of them and customization more easily
- Update
pyproject.tomlforuv-based env management. We now only suggest you to install veomni throughuv.
What's Changed
- update README.md by @Fazziekey in #188
- [dist] fix: refactor fsdp2 grad norm clipping by @Luosuu in #185
- [misc] fix: reset hf init flag for random init by @Luosuu in #176
- Fix Qwen3-Moe MFU by @zhihaofang1017 in #125
- [model] fix:avoid cpu-device sync for qwenvl on npu by @wey-code in #190
- [task] fix: replace DataArguments with MyDataArguments and remove duplicated step2token saving by @MuyaoLi-jimo in #189
- [ci] test: CI env test by @FoolPlayer in #201
- [ckpt][fix]release cuda mem after dcp sync save by @EricOlivier in #207
- [misc] feat: update uv support for aarch platform for Ascend+Kunpeng … by @pjgao in #148
- [data] fix: fix exception raised when fetching current_device on NPU by @ji-huazhong in #211
- [ci] test: fix data_ci by @Coach257 in #222
- [ci] test: add npu ci env by @FoolPlayer in #219
- title: [data] feat: Implement extensible data preprocessor registry by @TimYangst in #203
- [ci]Add NPU support to data and model test by @Crystal-jiang in #224
- [ci]Add Ascend NPU native support to the unit test code by @Crystal-jiang in #208
- [ci] chore: add gemini config & test by @FoolPlayer in #229
- [test] ci: add device api check for
testsby @onehaitao in #213 - [core] feat: registry for dataset & dataloader & checkpointer & ckpt_to_state_dict & chat_template & preprocess by @Coach257 in #230
- [dist] fix: make OptimizerState EP-dim aware to fix its dcp saving by @Luosuu in #228
- Automatically add the "ascend" label by @Crystal-jiang in #234
- [helper]:fix npu profiling by @Feng0w0 in #214
- helper: degrade veomni_patch functions to warnings/no-op by @iqiancheng in #197
- [ci] fix: dataloader in e2e ckpt test by @Luosuu in #233
- [feat] nccl_timeout by @brook-cpp in #217
- Automatically apply "ascend" label to issues and PRs by @Crystal-jiang in #239
- chore: Upgrade PyTorch dependencies to 2.8.0 and flash-attention to 2.8.3 by @TimYangst in #242
- [version] update transformers version to 4.57.0 by @phdddd in #243
- feat: distributed checkpointer support customized backend by @Ziyi-Wang in #182
- [ckpt] refactor: remove unused output_dir parameter from ckpt_to_state_dict by @TimYangst in #248
- [config, omni, dis] fix: quick fix for sft of Wan2.1-I2V-14B-480P by @zbian99 in #240
- [model] fix: Update
@check_model_inputsdecorator for transformers 4.57+ compatibility by @TimYangst in #252 - [core] fix is_x_backend by @brook-cpp in #251
- [data] fix: quick fix for exception raised when building dit dataloader on NPU by @zbian99 in #246
- upgrade: Upgrade transformers from v4.57.0 to v4.57.3 by @yiwzhao in #249
- [core] feat: model registry by @Coach257 in #258
- [dist] feat: unified veomni grad norm clipping by @Luosuu in #205
- [task]fix: fix train.sh NPROC_PER_NODE calculation logic on the NPU by @Crystal-jiang in #227
- [chore]: cache ep group by @heidongxianhua in #231
New Contributors
- @zhihaofang1017 made their first contribution in #125
- @MuyaoLi-jimo made their first contribution in #189
- @FoolPlayer made their first contribution in #201
- @EricOlivier made their first contribution in #207
- @pjgao made their first contribution in #148
- @ji-huazhong made their first contribution in #211
- @TimYangst made their first contribution in #203
- @Crystal-jiang made their first contribution in #224
- @onehaitao made their first contribution in #213
- @Feng0w0 made their first contribution in #214
- @iqiancheng made their first contribution in #197
- @brook-cpp made their first contribution in #217
- @phdddd made their first contribution in #243
- @zbian99 made their first contribution in #240
- @yiwzhao made their first contribution in #249
Full Changelog: v0.1.3...v0.1.4
v0.1.3
Highlights
- Qwen3VL (both dense and MoE) series support by @Juntian777
- DeepSeek performance restoration by @Luosuu
What's Changed
- [dist] feat: enable EP-aware optimizer for FSDP2-based MoE-VLM training. by @Juntian777 in #145
- [model] enable deepseek ulysses and fix deepseek transpose by @Luosuu in #152
- [dist] fix: add alltoall async by @heidongxianhua in #146
- [ckpt] fix: merge ckpt to hf script by @Luosuu in #156
- [model] fix: remove npu flash attention sync by @wey-code in #154
- [model] feat: support qwen3-vl dense by @Juntian777 in #164
- [logging] fix: the logging lineno of log_rank0 by @ValMystletainn in #160
- fix: repeat kv bug in flash attention forward with ulysses by @HaoyiZhu in #162
- fix: correct code formatting for PR162 by @Juntian777 in #165
- [model] perf: eliminate per-layer CPU-GPU sync in Qwen3-VL vision attention by @Juntian777 in #169
- [model] feat: add dummy forward for video input by @Juntian777 in #177
- perf: set reshard_after_forward to False for modules without MixedPrecision by @Luosuu in #153
- [doc] feat: how to enable new models in veomni by @Juntian777 in #179
- [model] feat: support qwen3 vl moe by @Juntian777 in #178
- [doc] feat: update README with new support for Qwen3-VL and Qwen3-VL-MoE by @Juntian777 in #180
- fix: workaround duplicated AllGather for EP+FSDP2 by @Luosuu in #173
New Contributors
- @wey-code made their first contribution in #154
- @ValMystletainn made their first contribution in #160
- @HaoyiZhu made their first contribution in #162
Full Changelog: v0.1.2...v0.1.3
v0.1.2
What's Changed
- [misc] shift bytecheckpoint to optional dependency by @Luosuu in #92
- [misc] revert ckpt default to avoid internal exceptions by @Luosuu in #93
- [dist] minor fixes by @Luosuu in #94
- [misc] feat: add GITBUG ISSUE TEMPLETE by @Fazziekey in #95
- [data] feat: support megatron-energon dataset by @ziqi-wlb in #62
- [data] add interleaved dataset by @Coach257 in #90
- fix:remove a failing assertion by @KaijingOfficial in #97
- [config] clean gitignore by @Luosuu in #99
- [dist] fix: DCP auto load by @Luosuu in #106
- [model] fix: Switch qwen3 and seed_oss to veomni defined GradientCheckpointingLayer by @piyifan123 in #109
- [misc] feat: Add uv support to allow simple
uv syncbased python package management by @piyifan123 in #110 - [BREAKING][dist] feat: Unified dcp saving for model and optimizer by @Luosuu in #107
- [misc] feat: add skip_ulysses flag to bypass Ulysses logic in flash_attention_forward by @Juntian777 in #111
- [dist] fix: remove unnecessary assert by @Luosuu in #112
- [misc] feat: option to profile rank0 only or all the ranks by @Luosuu in #113
- [misc] fix: remove buggy memory timeline export by @Luosuu in #114
- [config] feat: add allow_cuda_launch_blocking by @Luosuu in #115
- [ckpt] fix: remove unnecessary path joining for dcp by @Luosuu in #121
- [ckpt][BREAKING] fix unnecessary wrapping for model and optimizer states by @Luosuu in #122
- fix: qwen2 vl yaml by @Ziyi-Wang in #127
- [data] fix :fix data collator for sp with cu_seq_lens_q and max_length_q by @Fazziekey in #126
- [data] fix: dataset call hdfs api by @Ziyi-Wang in #128
- [misc] fix: update asomeworks by @Fazziekey in #135
- [model] fix: remove patch for npu by @heidongxianhua in #134
- [dist] feat: faster weight loading through broadcasting from rank0 by @Luosuu in #123
- [data] feat: support correct cu_seqlens handling for SP and non-SP by @Juntian777 in #136
- [ckpt] fix: rank for get last iteraton for non-dcp path by @Luosuu in #140
- [model] fix: deepseek-v3 by @Luosuu in #139
- [model] fix: remove Qwen3-MoE redundant flashattention prep and fix input_ids access bug by @Juntian777 in #141
- [data] fix: remove hf dependency on prepare_fa_kwargs_from_position_ids by @Juntian777 in #144
- fix: wan_attnetion_missing_config_issue by @JeffryLee in #133
- [fsdp] feat: support broadcast large weight by chunk. by @ZZWHU in #142
- [core] fix: use flash_attention_2 backend by @KKZ20 in #124
New Contributors
- @ziqi-wlb made their first contribution in #62
- @KaijingOfficial made their first contribution in #97
- @Juntian777 made their first contribution in #111
- @Ziyi-Wang made their first contribution in #127
- @heidongxianhua made their first contribution in #134
- @JeffryLee made their first contribution in #133
- @ZZWHU made their first contribution in #142
Full Changelog: v0.1.1...v0.1.2
v0.1.1: NPU support, Flexible mixed precision, DCP async, and more bug fixes
Pre-release
New features
- NPU support @FightingZhen
- DCP async support @Luosuu
- Flexible mixed precision control in FSDP2 @Luosuu
What's Changed
- [feat] fix dcp async save by @Luosuu in #80
- Add Ascend NPU native support by @FightingZhen in #65
- hot fix npu by @Fazziekey in #81
- update REAMDme by @Fazziekey in #82
- fix qwen3_moe & qwen2_vl inference bug by @Coach257 in #84
- [dist] fix: FSDP2 with flexible mixed policy control by @Luosuu in #86
- fix: use argument global_rank to check is rank 0 in get_checkpoint_path by @piyifan123 in #87
- [misc] test & code comments improvements by @Luosuu in #88
New Contributors
- @FightingZhen made their first contribution in #65
- @Coach257 made their first contribution in #84
- @piyifan123 made their first contribution in #87
Full Changelog: v0.1.0.post1...v0.1.1
v0.1.0.post1
We are excited to publish the first release of VeOmni. From now on, we will actively develop VeOmni on GitHub and strive to make features stable. Welcome bug reports and feature requests!
New features
What's Changed
- [model] feat: ds-v3 liger-kernel & convert ckpt by @ZiyueHuang in #3
- [docs] feat: add docs by @KKZ20 in #4
- [misc] feat: add logging.py by @Fazziekey in #7
- [misc] fix: remove hdfs requirements by @Fazziekey in #15
- [data] fix: remove streaming data by @Fazziekey in #19
- [ci] feat: Create pre-commit.yml by @Fazziekey in #20
- [ci] fix: update ruff flow by @Fazziekey in #23
- [dist] feat: support hsdp by @Fazziekey in #22
- [core] chore: update model and fsdp by @Fazziekey in #24
- [model] feat: add wan by @plorrrrrrr in #25
- [dist] feat: support async ulysses for dit by @plorrrrrrr in #26
- [core] feat: refactor attention interface and fix model loader by @KKZ20 in #27
- [misc] fix: add all_gather_into_tensor by @KKZ20 in #29
- [misc] feat: update wechat and paper by @Fazziekey in #30
- [dist] feat: Reconstructing fused MoE by @Fazziekey in #33
- [misc] feat: update Wechat by @Fazziekey in #34
- [model] feat: add flux by @yuyu5333 in #28
- [doc] update readme by @KKZ20 in #42
- [doc] correct quwen3-moe.yaml in README.md by @feifeibear in #39
- [model] feat: support seed_oss by @KKZ20 in #54
- update wechat by @Fazziekey in #55
- chore: fix bad link for wan_sft.yaml in README by @c8ef in #48
- [release] feat: release v0.1.0 by @Fazziekey in #75
- [doc] feat: EP+FSDP2 by @Luosuu in #78
- fix: import error - hdfs_io & VideoInput by @airlsyn in #77
- [model] fix: correct output tensor shape in Qwen3MoeSparseFusedMoeBlock by @RDShi in #76
New Contributors
- @ZiyueHuang made their first contribution in #3
- @Fazziekey made their first contribution in #7
- @plorrrrrrr made their first contribution in #25
- @yuyu5333 made their first contribution in #28
- @feifeibear made their first contribution in #39
- @c8ef made their first contribution in #48
- @Luosuu made their first contribution in #78
- @airlsyn made their first contribution in #77
- @RDShi made their first contribution in #76
Full Changelog: https://github.com/ByteDance-Seed/VeOmni/commits/v0.1.0.post1