In the original code, me->tail is updated through a sum and then a modulo operation. This creates a race condition if another thread reads the value of tail at the same time (for instance in the usedspace function). So doing the assignment in a single instruction should be safer. It's no guarantee though, so using atomic primitives would be safer.
I created pull request #4.