https://github.com/nerai/SmithWaterman/blob/4321d8fc9b85664b68ef55985b6137d5a02693c1/src/compare_avx.cpp#L139 _mm256_slli_si256 only shifts each 128 bit lane individually