--parallel slows down processing extremely, e.g. --parallel 8 from 10k tokens/s to about 1.5k tokens/s. It isn't clear whether this is a problem with the multiprocessing functionality itself or with something the SoMeWeTa engine does; but there appears to be massive synchronisation overhead.
MacOS 12.5.1 arm64 (M1)
Anaconda Python v3.9.12 with current SoMeWeTa from PyPI