Add assembly implementations for core functions (ChaCha12, Blake2s, SipHash-128) to improve performance.