Skip to content

Releases: ByteDance-Seed/VeOmni

v0.1.4

07 Dec 01:44
f491995

Choose a tag to compare

v0.1.4 Pre-release
Pre-release

Highlights

  • We now have GPU and NPU in CI and enable some tests on them. We call for more contribution of tests if you are interested!
  • Now VeOmni model/dataset/dataloader/checkpointer/chat_template/preprocess are registry-based, making adding new of them and customization more easily
  • Update pyproject.toml for uv-based env management. We now only suggest you to install veomni through uv.

What's Changed

  • update README.md by @Fazziekey in #188
  • [dist] fix: refactor fsdp2 grad norm clipping by @Luosuu in #185
  • [misc] fix: reset hf init flag for random init by @Luosuu in #176
  • Fix Qwen3-Moe MFU by @zhihaofang1017 in #125
  • [model] fix:avoid cpu-device sync for qwenvl on npu by @wey-code in #190
  • [task] fix: replace DataArguments with MyDataArguments and remove duplicated step2token saving by @MuyaoLi-jimo in #189
  • [ci] test: CI env test by @FoolPlayer in #201
  • [ckpt][fix]release cuda mem after dcp sync save by @EricOlivier in #207
  • [misc] feat: update uv support for aarch platform for Ascend+Kunpeng … by @pjgao in #148
  • [data] fix: fix exception raised when fetching current_device on NPU by @ji-huazhong in #211
  • [ci] test: fix data_ci by @Coach257 in #222
  • [ci] test: add npu ci env by @FoolPlayer in #219
  • title: [data] feat: Implement extensible data preprocessor registry by @TimYangst in #203
  • [ci]Add NPU support to data and model test by @Crystal-jiang in #224
  • [ci]Add Ascend NPU native support to the unit test code by @Crystal-jiang in #208
  • [ci] chore: add gemini config & test by @FoolPlayer in #229
  • [test] ci: add device api check for tests by @onehaitao in #213
  • [core] feat: registry for dataset & dataloader & checkpointer & ckpt_to_state_dict & chat_template & preprocess by @Coach257 in #230
  • [dist] fix: make OptimizerState EP-dim aware to fix its dcp saving by @Luosuu in #228
  • Automatically add the "ascend" label by @Crystal-jiang in #234
  • [helper]:fix npu profiling by @Feng0w0 in #214
  • helper: degrade veomni_patch functions to warnings/no-op by @iqiancheng in #197
  • [ci] fix: dataloader in e2e ckpt test by @Luosuu in #233
  • [feat] nccl_timeout by @brook-cpp in #217
  • Automatically apply "ascend" label to issues and PRs by @Crystal-jiang in #239
  • chore: Upgrade PyTorch dependencies to 2.8.0 and flash-attention to 2.8.3 by @TimYangst in #242
  • [version] update transformers version to 4.57.0 by @phdddd in #243
  • feat: distributed checkpointer support customized backend by @Ziyi-Wang in #182
  • [ckpt] refactor: remove unused output_dir parameter from ckpt_to_state_dict by @TimYangst in #248
  • [config, omni, dis] fix: quick fix for sft of Wan2.1-I2V-14B-480P by @zbian99 in #240
  • [model] fix: Update @check_model_inputs decorator for transformers 4.57+ compatibility by @TimYangst in #252
  • [core] fix is_x_backend by @brook-cpp in #251
  • [data] fix: quick fix for exception raised when building dit dataloader on NPU by @zbian99 in #246
  • upgrade: Upgrade transformers from v4.57.0 to v4.57.3 by @yiwzhao in #249
  • [core] feat: model registry by @Coach257 in #258
  • [dist] feat: unified veomni grad norm clipping by @Luosuu in #205
  • [task]fix: fix train.sh NPROC_PER_NODE calculation logic on the NPU by @Crystal-jiang in #227
  • [chore]: cache ep group by @heidongxianhua in #231

New Contributors

Full Changelog: v0.1.3...v0.1.4

v0.1.3

07 Nov 07:09
135bc45

Choose a tag to compare

v0.1.3 Pre-release
Pre-release

Highlights

  • Qwen3VL (both dense and MoE) series support by @Juntian777
  • DeepSeek performance restoration by @Luosuu

What's Changed

  • [dist] feat: enable EP-aware optimizer for FSDP2-based MoE-VLM training. by @Juntian777 in #145
  • [model] enable deepseek ulysses and fix deepseek transpose by @Luosuu in #152
  • [dist] fix: add alltoall async by @heidongxianhua in #146
  • [ckpt] fix: merge ckpt to hf script by @Luosuu in #156
  • [model] fix: remove npu flash attention sync by @wey-code in #154
  • [model] feat: support qwen3-vl dense by @Juntian777 in #164
  • [logging] fix: the logging lineno of log_rank0 by @ValMystletainn in #160
  • fix: repeat kv bug in flash attention forward with ulysses by @HaoyiZhu in #162
  • fix: correct code formatting for PR162 by @Juntian777 in #165
  • [model] perf: eliminate per-layer CPU-GPU sync in Qwen3-VL vision attention by @Juntian777 in #169
  • [model] feat: add dummy forward for video input by @Juntian777 in #177
  • perf: set reshard_after_forward to False for modules without MixedPrecision by @Luosuu in #153
  • [doc] feat: how to enable new models in veomni by @Juntian777 in #179
  • [model] feat: support qwen3 vl moe by @Juntian777 in #178
  • [doc] feat: update README with new support for Qwen3-VL and Qwen3-VL-MoE by @Juntian777 in #180
  • fix: workaround duplicated AllGather for EP+FSDP2 by @Luosuu in #173

New Contributors

Full Changelog: v0.1.2...v0.1.3

v0.1.2

17 Oct 18:08
600fe6d

Choose a tag to compare

v0.1.2 Pre-release
Pre-release

What's Changed

  • [misc] shift bytecheckpoint to optional dependency by @Luosuu in #92
  • [misc] revert ckpt default to avoid internal exceptions by @Luosuu in #93
  • [dist] minor fixes by @Luosuu in #94
  • [misc] feat: add GITBUG ISSUE TEMPLETE by @Fazziekey in #95
  • [data] feat: support megatron-energon dataset by @ziqi-wlb in #62
  • [data] add interleaved dataset by @Coach257 in #90
  • fix:remove a failing assertion by @KaijingOfficial in #97
  • [config] clean gitignore by @Luosuu in #99
  • [dist] fix: DCP auto load by @Luosuu in #106
  • [model] fix: Switch qwen3 and seed_oss to veomni defined GradientCheckpointingLayer by @piyifan123 in #109
  • [misc] feat: Add uv support to allow simple uv sync based python package management by @piyifan123 in #110
  • [BREAKING][dist] feat: Unified dcp saving for model and optimizer by @Luosuu in #107
  • [misc] feat: add skip_ulysses flag to bypass Ulysses logic in flash_attention_forward by @Juntian777 in #111
  • [dist] fix: remove unnecessary assert by @Luosuu in #112
  • [misc] feat: option to profile rank0 only or all the ranks by @Luosuu in #113
  • [misc] fix: remove buggy memory timeline export by @Luosuu in #114
  • [config] feat: add allow_cuda_launch_blocking by @Luosuu in #115
  • [ckpt] fix: remove unnecessary path joining for dcp by @Luosuu in #121
  • [ckpt][BREAKING] fix unnecessary wrapping for model and optimizer states by @Luosuu in #122
  • fix: qwen2 vl yaml by @Ziyi-Wang in #127
  • [data] fix :fix data collator for sp with cu_seq_lens_q and max_length_q by @Fazziekey in #126
  • [data] fix: dataset call hdfs api by @Ziyi-Wang in #128
  • [misc] fix: update asomeworks by @Fazziekey in #135
  • [model] fix: remove patch for npu by @heidongxianhua in #134
  • [dist] feat: faster weight loading through broadcasting from rank0 by @Luosuu in #123
  • [data] feat: support correct cu_seqlens handling for SP and non-SP by @Juntian777 in #136
  • [ckpt] fix: rank for get last iteraton for non-dcp path by @Luosuu in #140
  • [model] fix: deepseek-v3 by @Luosuu in #139
  • [model] fix: remove Qwen3-MoE redundant flashattention prep and fix input_ids access bug by @Juntian777 in #141
  • [data] fix: remove hf dependency on prepare_fa_kwargs_from_position_ids by @Juntian777 in #144
  • fix: wan_attnetion_missing_config_issue by @JeffryLee in #133
  • [fsdp] feat: support broadcast large weight by chunk. by @ZZWHU in #142
  • [core] fix: use flash_attention_2 backend by @KKZ20 in #124

New Contributors

Full Changelog: v0.1.1...v0.1.2

v0.1.1: NPU support, Flexible mixed precision, DCP async, and more bug fixes

24 Sep 07:09
53fc5b5

Choose a tag to compare

New features

What's Changed

New Contributors

Full Changelog: v0.1.0.post1...v0.1.1

v0.1.0.post1

22 Sep 06:33
2ce14f2

Choose a tag to compare

v0.1.0.post1 Pre-release
Pre-release

We are excited to publish the first release of VeOmni. From now on, we will actively develop VeOmni on GitHub and strive to make features stable. Welcome bug reports and feature requests!

New features

What's Changed

New Contributors

Full Changelog: https://github.com/ByteDance-Seed/VeOmni/commits/v0.1.0.post1