-
Notifications
You must be signed in to change notification settings - Fork 39
Open
Description
Could you explain why the SPD GLSL version only uses one barrier when cutting off all workgroups except the last?
// Only last active workgroup should proceed
bool SpdExitWorkgroup(AU1 numWorkGroups, AU1 localInvocationIndex, AU1 slice)
{
// global atomic counter
if (localInvocationIndex == 0)
{
SpdIncreaseAtomicCounter(slice);
}
SpdWorkgroupShuffleBarrier();
return (SpdGetAtomicCounter() != (numWorkGroups - 1));
}
void SpdWorkgroupShuffleBarrier() {
#ifdef A_GLSL
barrier();
#endif
#ifdef A_HLSL
GroupMemoryBarrierWithGroupSync();
#endif
}According to the GLSL specification, there should be pair of calls: barrier() + memoryBarrierImage()
It can't be that AMD and all users of the algorithm allow UB.
The direct link to official SPD snippet
Important notes:
- There is VRAM memory for mip 5 which should be synchronized before running last single workgroup. The
barrier()call is not enough i think. We need additionalmemoryBarrierImage(). - The mip 5 image has
coherentspecifier but the spec description is very hazy for me. - I can't imagine any example with
coherentbut withoutmemoryBarrierImage()
Side note: HLSL code would need additional sync (DeviceMemoryBarrier?) as well.
JuanDiegoMontoya, Karpov-Ruslan, matmuher, ASIF1998 and Andreyogld3d
Metadata
Metadata
Assignees
Labels
No labels