Consider this valid rust statement:
some_mut_slice[0] = dispatch_packet().global_id_x();
where the right hand value is basically the equivalent to get_global_id(0) in OpenCl.
On AMDGPU at least, the value of some_mut_slice[0] after the kernel returns is undefined. I'll bet Nvidia is similar in this. IBM's Power9 (as used in Summit, the fastest supercomputer in the world currently) which features SMT, so for example SMT1 on SMT4 hardware would be 4 slices (threads, basically) running w/ a single instruction pointer, would be the same in SMT1 or SMT2 (though I don't have documentation to back this up).
Essentially, I think the issue with the borrow model is that it assumes a single thread is a single thread and not a wavefront/warp/superslice. Thus, a mut borrow is unique.
I have no idea how to go about solving this.