Skip to content

[Triton] add a16w8 gemm for DS-R1 for o_proj for decode, add rocm_aiter_triton…#788

Open
k50112113 wants to merge 1 commit into355_wipfrom
shaoclee/355_wip_ds_a16w8_reduce_rms_rope_rebase
Open

[Triton] add a16w8 gemm for DS-R1 for o_proj for decode, add rocm_aiter_triton…#788
k50112113 wants to merge 1 commit into355_wipfrom
shaoclee/355_wip_ds_a16w8_reduce_rms_rope_rebase

Conversation

@k50112113
Copy link

@k50112113 k50112113 commented Nov 4, 2025

this PR includes optimization for DS-R1 FP8 TP8:

  1. add a16w8 gemm for o_proj for decode
  2. add rocm_aiter_triton_qkv_a_proj_layernorm

this PR depends on ROCm/aiter#1328

@k50112113 k50112113 changed the title [355_wip] add a16w8 gemm for DS-R1 for o_proj for decode, add rocm_aiter_triton… [Triton] add a16w8 gemm for DS-R1 for o_proj for decode, add rocm_aiter_triton… Nov 4, 2025
@k50112113 k50112113 force-pushed the shaoclee/355_wip_ds_a16w8_reduce_rms_rope_rebase branch from 4b1dcc7 to ae9c305 Compare November 18, 2025 14:18
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

@github-actions github-actions bot added the stale label Feb 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant