forked from pytorch/FBGEMM
-
Notifications
You must be signed in to change notification settings - Fork 9
Pull requests: ROCm/FBGEMM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Implement pre-sorting, caching and contigous warp processing in group_index_select
#144
opened Mar 3, 2026 by
avbokovoy
Loading…
Split Kernel Optimization for New feature or request
group_index_select_or_add_2d_kernel
enhancement
#142
opened Feb 2, 2026 by
aryaman-gupta
Loading…
Implement cached member_id upper bound search
enhancement
New feature or request
#141
opened Feb 2, 2026 by
avbokovoy
Loading…
Implement asynchronous LDS loads for MI350
enhancement
New feature or request
#138
opened Dec 19, 2025 by
avbokovoy
Loading…
Optimizations for index_select_scalar_cumsum_kernel
#137
opened Dec 16, 2025 by
amd-wsung102
Loading…
1 task
Optimize
group_index_select_or_add_2d_kernel by adding a separate codepath for small embedding dimensions
#135
opened Dec 16, 2025 by
aryaman-gupta
Loading…
Fixes bug in one specialized HIP instantiation of the
warp-per-row kernel
#134
opened Dec 5, 2025 by
aryaman-gupta
Loading…
tuned grid size by reducing num_warps_per_threadblock to 4
#117
opened Aug 26, 2025 by
kudomcho
Loading…
1 task
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.