Question about L2 cache swizzling in FP8 GEMM

Thank you for sharing your amazing solutions. Could you give more information on the logic of L2 cache swizzling used in your kernel ?
Is this similar to the "Scheduling and L2 cache" in Kernel 6 of this blog, where the author improved L2 cache hit: 

https://cudaforfun.substack.com/p/outperforming-cublas-on-h100-a-worklog

Thanks,
Cong