[BUGFIX] Fix performance hit for grouped convolutions #216

sdatkinson · 2026-01-29T08:06:02Z

The current implementation of grouped convolutions is worse than just doing the non-grouped operation on the full matrix with zeroes on the off-block-diagonals.

I tried making an array of blocks, but the GEMM overhead appears to dominate.

In order to improve, the approach needs to improve on overhead. Compile-time improvements and/or specialized implementations for specific sizes are likely to work.

This is better than what's currently on main, but it's not good enough. "Little steps".

This reverts commit e78e191.

sdatkinson added 5 commits January 28, 2026 23:23

Zero out conv weight matrices after resize

c20fb86

Improve speed of small grouped convolutions with single GEMM

546f820

Implement std::vector grouped_weights

e78e191

Revert "Implement std::vector grouped_weights"

e3be255

This reverts commit e78e191.

Improve grouped convolutions for Conv1D by...ignoring them for now.

2ad9dec

sdatkinson merged commit 12f93a2 into main Jan 29, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUGFIX] Fix performance hit for grouped convolutions #216

[BUGFIX] Fix performance hit for grouped convolutions #216

Uh oh!

sdatkinson commented Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[BUGFIX] Fix performance hit for grouped convolutions #216

[BUGFIX] Fix performance hit for grouped convolutions #216

Uh oh!

Conversation

sdatkinson commented Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants