At the moment element-wise operations in NMatrix is performed in Ruby and not in C/C++ as in the rest of the library. Finding a way to speed up element-wise operations, either by patching the NMatrix gem or some other means will result in a massive performance gain.