torch.mps.synchronize doesn't seem to work

I get a insanely high flop throughput (like 17 PF) on my Mac M2 Ultra.

Using mlx seems to produce reasonable numbers synchronizing the GPU with mx.eval(output_marix)