Releases · ByteDance-Seed/VeOmni

07 Dec 01:44

Luosuu

v0.1.4

f491995

v0.1.4 Pre-release

Pre-release

Highlights

We now have GPU and NPU in CI and enable some tests on them. We call for more contribution of tests if you are interested!
Now VeOmni model/dataset/dataloader/checkpointer/chat_template/preprocess are registry-based, making adding new of them and customization more easily
Update pyproject.toml for uv-based env management. We now only suggest you to install veomni through uv.

What's Changed

update README.md by @Fazziekey in #188
[dist] fix: refactor fsdp2 grad norm clipping by @Luosuu in #185
[misc] fix: reset hf init flag for random init by @Luosuu in #176
Fix Qwen3-Moe MFU by @zhihaofang1017 in #125
[model] fix:avoid cpu-device sync for qwenvl on npu by @wey-code in #190
[task] fix: replace DataArguments with MyDataArguments and remove duplicated step2token saving by @MuyaoLi-jimo in #189
[ci] test: CI env test by @FoolPlayer in #201
[ckpt][fix]release cuda mem after dcp sync save by @EricOlivier in #207
[misc] feat: update uv support for aarch platform for Ascend+Kunpeng … by @pjgao in #148
[data] fix: fix exception raised when fetching current_device on NPU by @ji-huazhong in #211
[ci] test: fix data_ci by @Coach257 in #222
[ci] test: add npu ci env by @FoolPlayer in #219
title: [data] feat: Implement extensible data preprocessor registry by @TimYangst in #203
[ci]Add NPU support to data and model test by @Crystal-jiang in #224
[ci]Add Ascend NPU native support to the unit test code by @Crystal-jiang in #208
[ci] chore: add gemini config & test by @FoolPlayer in #229
[test] ci: add device api check for tests by @onehaitao in #213
[core] feat: registry for dataset & dataloader & checkpointer & ckpt_to_state_dict & chat_template & preprocess by @Coach257 in #230
[dist] fix: make OptimizerState EP-dim aware to fix its dcp saving by @Luosuu in #228
Automatically add the "ascend" label by @Crystal-jiang in #234
[helper]:fix npu profiling by @Feng0w0 in #214
helper: degrade veomni_patch functions to warnings/no-op by @iqiancheng in #197
[ci] fix: dataloader in e2e ckpt test by @Luosuu in #233
[feat] nccl_timeout by @brook-cpp in #217
Automatically apply "ascend" label to issues and PRs by @Crystal-jiang in #239
chore: Upgrade PyTorch dependencies to 2.8.0 and flash-attention to 2.8.3 by @TimYangst in #242
[version] update transformers version to 4.57.0 by @phdddd in #243
feat: distributed checkpointer support customized backend by @Ziyi-Wang in #182
[ckpt] refactor: remove unused output_dir parameter from ckpt_to_state_dict by @TimYangst in #248
[config, omni, dis] fix: quick fix for sft of Wan2.1-I2V-14B-480P by @zbian99 in #240
[model] fix: Update @check_model_inputs decorator for transformers 4.57+ compatibility by @TimYangst in #252
[core] fix is_x_backend by @brook-cpp in #251
[data] fix: quick fix for exception raised when building dit dataloader on NPU by @zbian99 in #246
upgrade: Upgrade transformers from v4.57.0 to v4.57.3 by @yiwzhao in #249
[core] feat: model registry by @Coach257 in #258
[dist] feat: unified veomni grad norm clipping by @Luosuu in #205
[task]fix: fix train.sh NPROC_PER_NODE calculation logic on the NPU by @Crystal-jiang in #227
[chore]: cache ep group by @heidongxianhua in #231

New Contributors

@zhihaofang1017 made their first contribution in #125
@MuyaoLi-jimo made their first contribution in #189
@FoolPlayer made their first contribution in #201
@EricOlivier made their first contribution in #207
@pjgao made their first contribution in #148
@ji-huazhong made their first contribution in #211
@TimYangst made their first contribution in #203
@Crystal-jiang made their first contribution in #224
@onehaitao made their first contribution in #213
@Feng0w0 made their first contribution in #214
@iqiancheng made their first contribution in #197
@brook-cpp made their first contribution in #217
@phdddd made their first contribution in #243
@zbian99 made their first contribution in #240
@yiwzhao made their first contribution in #249

Full Changelog: v0.1.3...v0.1.4

Contributors

TimYangst, iqiancheng, and 19 other contributors

Assets 2

0 Join discussion

07 Nov 07:09

Luosuu

v0.1.3

135bc45

v0.1.3 Pre-release

Pre-release

Highlights

Qwen3VL (both dense and MoE) series support by @Juntian777
DeepSeek performance restoration by @Luosuu

What's Changed

[dist] feat: enable EP-aware optimizer for FSDP2-based MoE-VLM training. by @Juntian777 in #145
[model] enable deepseek ulysses and fix deepseek transpose by @Luosuu in #152
[dist] fix: add alltoall async by @heidongxianhua in #146
[ckpt] fix: merge ckpt to hf script by @Luosuu in #156
[model] fix: remove npu flash attention sync by @wey-code in #154
[model] feat: support qwen3-vl dense by @Juntian777 in #164
[logging] fix: the logging lineno of log_rank0 by @ValMystletainn in #160
fix: repeat kv bug in flash attention forward with ulysses by @HaoyiZhu in #162
fix: correct code formatting for PR162 by @Juntian777 in #165
[model] perf: eliminate per-layer CPU-GPU sync in Qwen3-VL vision attention by @Juntian777 in #169
[model] feat: add dummy forward for video input by @Juntian777 in #177
perf: set reshard_after_forward to False for modules without MixedPrecision by @Luosuu in #153
[doc] feat: how to enable new models in veomni by @Juntian777 in #179
[model] feat: support qwen3 vl moe by @Juntian777 in #178
[doc] feat: update README with new support for Qwen3-VL and Qwen3-VL-MoE by @Juntian777 in #180
fix: workaround duplicated AllGather for EP+FSDP2 by @Luosuu in #173

New Contributors

@wey-code made their first contribution in #154
@ValMystletainn made their first contribution in #160
@HaoyiZhu made their first contribution in #162

Full Changelog: v0.1.2...v0.1.3

Contributors

heidongxianhua, ValMystletainn, and 4 other contributors

Assets 2

0 Join discussion

17 Oct 18:08

Luosuu

v0.1.2

600fe6d

v0.1.2 Pre-release

Pre-release

What's Changed

[misc] shift bytecheckpoint to optional dependency by @Luosuu in #92
[misc] revert ckpt default to avoid internal exceptions by @Luosuu in #93
[dist] minor fixes by @Luosuu in #94
[misc] feat: add GITBUG ISSUE TEMPLETE by @Fazziekey in #95
[data] feat: support megatron-energon dataset by @ziqi-wlb in #62
[data] add interleaved dataset by @Coach257 in #90
fix:remove a failing assertion by @KaijingOfficial in #97
[config] clean gitignore by @Luosuu in #99
[dist] fix: DCP auto load by @Luosuu in #106
[model] fix: Switch qwen3 and seed_oss to veomni defined GradientCheckpointingLayer by @piyifan123 in #109
[misc] feat: Add uv support to allow simple uv sync based python package management by @piyifan123 in #110
[BREAKING][dist] feat: Unified dcp saving for model and optimizer by @Luosuu in #107
[misc] feat: add skip_ulysses flag to bypass Ulysses logic in flash_attention_forward by @Juntian777 in #111
[dist] fix: remove unnecessary assert by @Luosuu in #112
[misc] feat: option to profile rank0 only or all the ranks by @Luosuu in #113
[misc] fix: remove buggy memory timeline export by @Luosuu in #114
[config] feat: add allow_cuda_launch_blocking by @Luosuu in #115
[ckpt] fix: remove unnecessary path joining for dcp by @Luosuu in #121
[ckpt][BREAKING] fix unnecessary wrapping for model and optimizer states by @Luosuu in #122
fix: qwen2 vl yaml by @Ziyi-Wang in #127
[data] fix :fix data collator for sp with cu_seq_lens_q and max_length_q by @Fazziekey in #126
[data] fix: dataset call hdfs api by @Ziyi-Wang in #128
[misc] fix: update asomeworks by @Fazziekey in #135
[model] fix: remove patch for npu by @heidongxianhua in #134
[dist] feat: faster weight loading through broadcasting from rank0 by @Luosuu in #123
[data] feat: support correct cu_seqlens handling for SP and non-SP by @Juntian777 in #136
[ckpt] fix: rank for get last iteraton for non-dcp path by @Luosuu in #140
[model] fix: deepseek-v3 by @Luosuu in #139
[model] fix: remove Qwen3-MoE redundant flashattention prep and fix input_ids access bug by @Juntian777 in #141
[data] fix: remove hf dependency on prepare_fa_kwargs_from_position_ids by @Juntian777 in #144
fix: wan_attnetion_missing_config_issue by @JeffryLee in #133
[fsdp] feat: support broadcast large weight by chunk. by @ZZWHU in #142
[core] fix: use flash_attention_2 backend by @KKZ20 in #124

New Contributors

@ziqi-wlb made their first contribution in #62
@KaijingOfficial made their first contribution in #97
@Juntian777 made their first contribution in #111
@Ziyi-Wang made their first contribution in #127
@heidongxianhua made their first contribution in #134
@JeffryLee made their first contribution in #133
@ZZWHU made their first contribution in #142

Full Changelog: v0.1.1...v0.1.2

Contributors

JeffryLee, ziqi-wlb, and 10 other contributors

Assets 2

24 Sep 07:09

Luosuu

v0.1.1

53fc5b5

v0.1.1: NPU support, Flexible mixed precision, DCP async, and more bug fixes Pre-release

Pre-release

New features

NPU support @FightingZhen
DCP async support @Luosuu
Flexible mixed precision control in FSDP2 @Luosuu

What's Changed

[feat] fix dcp async save by @Luosuu in #80
Add Ascend NPU native support by @FightingZhen in #65
hot fix npu by @Fazziekey in #81
update REAMDme by @Fazziekey in #82
fix qwen3_moe & qwen2_vl inference bug by @Coach257 in #84
[dist] fix: FSDP2 with flexible mixed policy control by @Luosuu in #86
fix: use argument global_rank to check is rank 0 in get_checkpoint_path by @piyifan123 in #87
[misc] test & code comments improvements by @Luosuu in #88

New Contributors

@FightingZhen made their first contribution in #65
@Coach257 made their first contribution in #84
@piyifan123 made their first contribution in #87

Full Changelog: v0.1.0.post1...v0.1.1

Contributors

FightingZhen, Luosuu, and 3 other contributors

Assets 2

22 Sep 06:33

Luosuu

v0.1.0.post1

2ce14f2

v0.1.0.post1 Pre-release

Pre-release

We are excited to publish the first release of VeOmni. From now on, we will actively develop VeOmni on GitHub and strive to make features stable. Welcome bug reports and feature requests!

New features

EP+FSDP2 and its DCP support: tutorial @Luosuu

What's Changed

[model] feat: ds-v3 liger-kernel & convert ckpt by @ZiyueHuang in #3
[docs] feat: add docs by @KKZ20 in #4
[misc] feat: add logging.py by @Fazziekey in #7
[misc] fix: remove hdfs requirements by @Fazziekey in #15
[data] fix: remove streaming data by @Fazziekey in #19
[ci] feat: Create pre-commit.yml by @Fazziekey in #20
[ci] fix: update ruff flow by @Fazziekey in #23
[dist] feat: support hsdp by @Fazziekey in #22
[core] chore: update model and fsdp by @Fazziekey in #24
[model] feat: add wan by @plorrrrrrr in #25
[dist] feat: support async ulysses for dit by @plorrrrrrr in #26
[core] feat: refactor attention interface and fix model loader by @KKZ20 in #27
[misc] fix: add all_gather_into_tensor by @KKZ20 in #29
[misc] feat: update wechat and paper by @Fazziekey in #30
[dist] feat: Reconstructing fused MoE by @Fazziekey in #33
[misc] feat: update Wechat by @Fazziekey in #34
[model] feat: add flux by @yuyu5333 in #28
[doc] update readme by @KKZ20 in #42
[doc] correct quwen3-moe.yaml in README.md by @feifeibear in #39
[model] feat: support seed_oss by @KKZ20 in #54
update wechat by @Fazziekey in #55
chore: fix bad link for wan_sft.yaml in README by @c8ef in #48
[release] feat: release v0.1.0 by @Fazziekey in #75
[doc] feat: EP+FSDP2 by @Luosuu in #78
fix: import error - hdfs_io & VideoInput by @airlsyn in #77
[model] fix: correct output tensor shape in Qwen3MoeSparseFusedMoeBlock by @RDShi in #76