Optimize `group_index_select_or_add_2d_kernel` by adding a separate codepath for small embedding dimensions by aryaman-gupta · Pull Request #135 · ROCm/FBGEMM

aryaman-gupta · 2025-12-16T16:37:59Z

This PR optimizes the performance of the group_index_select_or_add_2d_kernel kernel on tables with small embedding dimensions (i.e., num_cols).

For tables with small embedding dimensions, the code is refactored to process multiple rows within the same warp. Two files are changed:

fbgemm_gpu/src/sparse_ops/sparse_ops_gpu.cpp - The calculation of the warp_offsets is changed in the host-side code.
fbgemm_gpu/src/sparse_ops/sparse_group_index.cu - The group_index_select_or_add_2d_kernel kernel is modified to process multiple rows within a warp for small embedding dimensions.

Benchmark results:

Benchmark 1:

	Forward	Backward
Baseline	10 ms	14 ms
Optimized	7 ms	10.5 ms

Benchmark 2:

	Forward	Backward
Baseline	3 ms	6 ms
Optimized	2 ms	5 ms

…r_add_2d_kernel

…or_add_2d_kernel

…p_index_select_or_add_2d_kernel

…zed small embedding dims path

…isable optimized smallEmbD path

adds optimized path for small dimension sizes to group_index_select_o…

85caa29

…r_add_2d_kernel

aryaman-gupta assigned aryaman-gupta and liligwu and unassigned aryaman-gupta Dec 16, 2025

aryaman-gupta added 6 commits December 16, 2025 17:27

sparse_group_index.cu: edits some comments

ff1b9b6

adds USE_ROCM guards to subwarp optimizations for group_index_select_…

439a51a

…or_add_2d_kernel

sparse_group_index: handle UNROLL_FACTOR for small dimensions in grou…

2a85d73

…p_index_select_or_add_2d_kernel

sparse_group_index: handle fixed-column-size case correctly in optimi…

2f54140

…zed small embedding dims path

group_index_select_or_add_2d_kernel: when num_cols < UNROLL_FACTOR, d…

e0edc40

…isable optimized smallEmbD path

sparse_group_index: use const auto where possible

bbdc17d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `group_index_select_or_add_2d_kernel` by adding a separate codepath for small embedding dimensions#135

Optimize `group_index_select_or_add_2d_kernel` by adding a separate codepath for small embedding dimensions#135
aryaman-gupta wants to merge 7 commits intomain_12162025_upstreamfrom
aryaman/group-index-subwarp

aryaman-gupta commented Dec 16, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aryaman-gupta commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aryaman-gupta commented Dec 16, 2025 •

edited

Loading