Commit 91b4c75
authored
perf[gpu]: reduce register pressure in dyn dispatch (#7489)
We decrease the number of values per tile in the output stage each GPU
thread uses, as well as limit the register count to 32 in the launch
bounds. This brings the dynamic dispatch kernel into a reasonably close
range compared to the standalone kernel for now.
Type | Dynamic dispatch | Standalone | Ratio |
|---|---|---|---|
| u8 bw6 | 172 µs | 79 µs | 2.17× |
| u16 bw6 | 140 µs | 88 µs | 1.59× |
| u32 bw6 | 184 µs | 148 µs | 1.24× |
| u64 bw8 | 303 µs | 276 µs | 1.10×|
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>1 parent 1169d84 commit 91b4c75
1 file changed
+6
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
279 | 279 | | |
280 | 280 | | |
281 | 281 | | |
282 | | - | |
| 282 | + | |
| 283 | + | |
283 | 284 | | |
284 | 285 | | |
285 | 286 | | |
| |||
472 | 473 | | |
473 | 474 | | |
474 | 475 | | |
475 | | - | |
476 | | - | |
477 | | - | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
478 | 480 | | |
479 | 481 | | |
480 | 482 | | |
| |||
0 commit comments