Skip to content

[DO NOT MERGE] Temp Blackwell benchmark#248

Open
Strivin0311 wants to merge 50 commits intomainfrom
blackwell_benchmark
Open

[DO NOT MERGE] Temp Blackwell benchmark#248
Strivin0311 wants to merge 50 commits intomainfrom
blackwell_benchmark

Conversation

@Strivin0311
Copy link
Copy Markdown
Contributor

DONE

  • add temporary Blackwell benchmark preparation including docker files, configs and scripts.

Strivin0311 and others added 30 commits February 6, 2026 11:31
* Modify the static solver so that each segment of input_split_size is divisible by the same number

* modify chunk logic in static solver
* add merge_with_split_alignment method in AttnRanges

* support split alignment in dynamic solver
* relaxed the buffer size up to INT_MAX limit for internode

* tested over INT_MAX buffer size in exp/grpcoll tests

* minor fixed

* added docstring for config funcs

* added minimium num bytes check for native grpcoll

* fixed tma bytes and num warps for internode cache notify kernel

* raised up default num_rdma_bytes

* further fixed internode cache notify kernel for group reduce

* removed the temp debug code to make benchmark mask split-aligned
* added num_heads_q,kv,group to comm meta for dynamic solver; added seperate split alignment for kv/qo

* added num_heads_q/kv to comm meta for dynamic solver

* supported split alignment varying from dtype

* added native_grpcoll_split_alignment to test_pipeline/test_pipeline_sdpa

* tested through dynamic split alignment for pipeline ut; added world size offset for seed

* added some comments

* added MAGI_ATTENTION_NATIVE_GRPCOLL_SPLIT_ALIGNMENT to docs
* updated and polished api for required num_heads_q, num_heads_kv, head_dim

* adjusted the calls in ut for updated APIs

* adjusted the calls in examples for updated APIs

* adjusted the calls in exps for updated APIs

* adjusted the calls in docs and readme for updated APIs, as well as deleting the magi_attn_varlen_dipatch and magi_attn_flex_dispatch deprecated APIs

* minor updated tests/test_api/test_interface.py

* minor updated benchmark dockerfile
* added head dim to comm meta

* supported auto split alignment w/o varying from dtypes

* minor updated repr and utils

* added strategy for calc_split_alignment
Big-TRex and others added 9 commits February 6, 2026 11:31
* updated run_grpcoll_test script for B300

* updated benchmark conf and script for blackwell

* added script to install custom nvshmem-3.4.5 on b300 roce
* support save last stage for bwd overlap policy

* rename backward_overlap_policy to backward_hide_tail_reduce

* rename several params about hide_tail_stage_reduce
* fixed missing split_alignment_kv for dist_attn_solver

* fixed missing arg for _hide_tail_stage_reduce_backward
@Strivin0311 Strivin0311 changed the title [WIP] Temp Blackwell benchmark [DO NOT MERGE] Temp Blackwell benchmark Feb 9, 2026
@Strivin0311 Strivin0311 force-pushed the native_grpcoll/per_split_token branch from c5103dd to 7891743 Compare February 13, 2026 17:12
Base automatically changed from native_grpcoll/per_split_token to main February 28, 2026 03:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants