Conversation
|
Oh awesome! Yeah the current prefix sum is super naive, a fast one would be nice. If it helps, Genna recentl added an almost generic prefix sum to Burn: https://github.com/tracel-ai/burn/blob/1916c68fe8dc37a589c599d1e4b68722e8cfb2c6/crates/burn-vision/src/backends/cube/connected_components/prefix_sum.rs#L20 It uses some tricks not appicable here but is otherwise almost generic. See some discussion on the Discord: https://discord.com/channels/1038839012602941528/1038839013735399547/1336012001016807491 A faster CubeCL sort is also somewhere on the horizon tracel-ai/cubecl#498 |
b417c8e to
ad60b21
Compare
Ah ok, thanks for showing me. Still getting familiar with CubeCL/Burn/Brush, so keep pointing these out as I go! |
|
Yeah I agree they really should be! I think Genna at the time didn' feel like making it a generic version in Cube as she just needed it for the vision library. I'm sure if you have a good version for Cube they'd take it (and Genna prolly would help you out), but it'll definitely be extra work to upstream it, up to you! |
05a84fc to
9b4e37b
Compare
Draft for prefix sum implementation in CubeCL.
Currently, performance is 20% worse (comparing WgpuRuntime to WgpuRuntime)