[PERFORMANCE] Grouped convolutions appear dominated by overhead

I'm profiling the models [here](https://drive.google.com/drive/folders/1geIj2Mi6V8qmCbLZXZo7n0SDWcxJR37s?usp=drive_link) and I'm getting poor results when grouped convolutions are used.

For example, `1x1_groups` looks at scaling upt he number of groups in an 8-channel 1x1:

<img width="2964" height="1764" alt="Image" src="https://github.com/user-attachments/assets/d5487b27-ae28-4558-8832-d220d1131d32" />

This plot makes it look like the overhead is what's dominating the calculation since the compute time scales linearly with the number of groups.

This is bad because it means that grouped convolutions are basically useless--better to just use the full matrix with a ton of zeroes.

I suspect that compile-time optimizations may solve this, but I'm surprised that it's _this_ bad.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PERFORMANCE] Grouped convolutions appear dominated by overhead #215

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[PERFORMANCE] Grouped convolutions appear dominated by overhead #215

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions