Squash to main by tastelikefeet · Pull Request #46 · modelscope/twinkle

tastelikefeet · 2026-02-09T05:42:52Z

No description provided.

Move sequence-parallel strategy construction to a lazy method `_ensure_sp_strategy` to reduce side effects during model initialization. The strategy is now created only when needed, after the underlying Hugging Face model is fully initialized and before wrapping. This improves initialization performance and avoids unnecessary overhead when sequence parallelism is not enabled.

sequence parallel add tests

fix native_fsdp+moe

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…into update_sample

update ep cookbook

docs for npu support

* feat(sequence_parallel): refactor loss reduction using custom autograd functions Replace manual gradient handling with `torch.autograd.Function` subclasses `_ReduceSequenceParallelLoss` and `_ReduceSequenceParallelSum` to compute global loss via autograd-aware all-reduce. This simplifies the logic for both sum and mean reductions, improves gradient correctness, and removes the need for separate metric scaling when `world_size > 1`. * feat(sequence_parallel): compensate gradient scaling for FSDP averaging Add `compensate_fsdp_avg` config flag to adjust loss reduction when sequence parallel (SP) is combined with FSDP or accelerate DDP/FSDP. This prevents gradient magnitude from being incorrectly scaled down by an extra factor of SP world size during data-parallel averaging. - In `GatherLoss` backward, scale gradients by SP world size before splitting, so downstream FSDP averaging does not shrink this path. - In `SequenceParallelStrategy.reduce_loss`, apply a compensation factor (ulysses_size) when `compensate_fsdp_avg` is enabled. - Automatically set `compensate_fsdp_avg=True` in `TransformersModel` when using NativeFSDPStrategy or AccelerateStrategy with both SP and data parallelism active. * delete unused unit test * fix lint * feat: add kernels optional dependency and refactor CI installation - Add 'kernels' as an optional dependency group in pyproject.toml - Refactor CI container test script to use a reusable installation function - Install twinkle with kernels in both debug and release modes for consistency - Improve maintainability by centralizing the installation command * feat(kernel): add backward compatibility for kernels API changes Update `_load_from_hub` function to handle API changes in `select_revision_or_version` and `get_kernel` calls. The changes introduce try-except blocks to catch `TypeError` exceptions, allowing the function to work with both modern keyword-based APIs and older positional argument variants. This ensures compatibility across different versions of the kernels module without breaking existing functionality.

* feat(tests): replace manual sp_group retrieval with module attribute Replace calls to `_get_sp_group_from_device_mesh` with direct access to `sequence_parallel._sp_group` in sequence parallel attention tests. This simplifies the test setup by using the already initialized group stored in the module, improving code clarity and reducing redundancy. * feat(tests): improve kernel availability check in test_function_kernel Add additional imports and a try-except block to verify that the 'kernels-test/flattened-build' kernel can be successfully loaded in the current environment before proceeding with the test. This prevents test failures due to environment-specific loading issues and provides a more informative skip message.

* feat(tests): replace manual sp_group retrieval with module attribute Replace calls to `_get_sp_group_from_device_mesh` with direct access to `sequence_parallel._sp_group` in sequence parallel attention tests. This simplifies the test setup by using the already initialized group stored in the module, improving code clarity and reducing redundancy. * feat(tests): improve kernel availability check in test_function_kernel Add additional imports and a try-except block to verify that the 'kernels-test/flattened-build' kernel can be successfully loaded in the current environment before proceeding with the test. This prevents test failures due to environment-specific loading issues and provides a more informative skip message. * wip * wip * remove debug info

* feat(tests): replace manual sp_group retrieval with module attribute Replace calls to `_get_sp_group_from_device_mesh` with direct access to `sequence_parallel._sp_group` in sequence parallel attention tests. This simplifies the test setup by using the already initialized group stored in the module, improving code clarity and reducing redundancy. * feat(tests): improve kernel availability check in test_function_kernel Add additional imports and a try-except block to verify that the 'kernels-test/flattened-build' kernel can be successfully loaded in the current environment before proceeding with the test. This prevents test failures due to environment-specific loading issues and provides a more informative skip message. * wip * wip * remove debug info * feat: add ep/sp FSDP MoE finetuning entry and update script - Add new entry for ep/sp FSDP MoE finetuning in README table - Update ep_fsdp_qwen3_moe.py script to include ulysses_size parameter for enhanced parallelism configuration

tastelikefeet and others added 30 commits February 5, 2026 15:46

wip

c39bee6

add load remote

b1f6746

fix

8d639d3

fix

33b3c1b

add tinker resume from remote

3f40ddd

add manager

42e84d8

update import

fe54791

test accurancy

d0709b2

feat(sp): add fsdp/ep alignment tests

cb7ee9a

wip

46b6948

Merge pull request #36 from modelscope/sp_ljl_dev

43ada26

sequence parallel add tests

wip

a96934f

Merge branch 'fix/0205-2' into dev

8b4604d

add more docs

47632ee

wip

39c52a0

Merge remote-tracking branch 'origin/dev' into dev-wkw

b1b11b4

wip

1e6f95b

Merge pull request #33 from modelscope/dev-wkw

5522ff1

fix native_fsdp+moe

fix npu sft lora

f096efb

update server

4dcc452

Update src/twinkle/server/launcher.py

35f349d

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

update server

00e8819

Merge branch 'update_sample' of https://github.com/modelscope/twinkle …

0803a21

…into update_sample

add more docs

8ef21e9

Merge branch 'fix/0206-1' into dev

3a8e3ee

update ep cookbook

09abd37

Merge pull request #40 from modelscope/dev-wkw

396b5b5

update ep cookbook

fix npu grpo lora training and add vllm graceful shutdown

ad616ab

Merge pull request #34 from modelscope/doc_npu

a91aa70

docs for npu support

tastelikefeet and others added 28 commits February 13, 2026 14:25

fix

c685f06

fix config

92717fa

change token name

991a8d5

fix

0a4ce8f

Merge commit '991a8d50c97bccb0affdfccb0e2adcc4c92a95ab' into dev

4007aa5

fix rl demo

d2f3406

Merge branch 'dev' of github.com:modelscope/twinkle into dev

5665ba8

add acknowledgements

1595c38

tests

a5c4927

remote legacy folder

24c3923

remove obsolete file

c42ff54

Merge commit 'a5c492744e45306ee51478098526a15250ab5d94' into dev

e18fd36

lint code

7c17844

fix

3ed9edd

remove

6a2b9e7

fix import bugs

188dd72

fix doc

acc7183

fix vllm remove lora

dbceec8

Merge branch 'dev' of github.com:modelscope/twinkle into dev

9b875d7

fix

3f8f6b0

update grpo demo

4f57cbf

update demo

b4c0e65

update

0857154

clean

0e89db9

tastelikefeet merged commit 9aa5579 into main Feb 13, 2026
3 of 4 checks passed

tastelikefeet deleted the dev branch February 13, 2026 09:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Squash to main#46

Squash to main#46
tastelikefeet merged 829 commits intomainfrom
dev

tastelikefeet commented Feb 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

tastelikefeet commented Feb 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants