Skip to content

Conversation

@anthony-zy
Copy link
Contributor

This PR add RVV to optimize Memcopy64 function , improving compression speed by ~46%.

Optimize Snappy 1.2.2 performance

Added RVV support for RISC-V in Snappy, optimizing Memcopy64 by leveraging RVV vector load/store instructions (e.g., vle8_v_u8m2, vse8_v_u8m2) to reduce memory copy overhead and improve decompression performance. lzbench 2.1 tests onsilesia.tar (GCC 13.2.1, 64-bit Linux) show:
Compressor Compress Decompress Size Ratio
Snappy 1.2.2 (Before) 64.3 MB/s 186 MB/s 101403263 47.84%
Snappy 1.2.2 (After) 64.4 MB/s 272 MB/s 101403263 47.84%

Snappy 1.2.2 unittest ([ PASSED ] 21 tests.)

Running main() from gmock_main.cc
[==========] Running 21 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from CorruptedTest
[ RUN ] CorruptedTest.VerifyCorrupted
Crazy decompression lengths not checked on 64-bit build
[ OK ] CorruptedTest.VerifyCorrupted (5 ms)
[----------] 1 test from CorruptedTest (5 ms total)

[----------] 17 tests from Snappy
[ RUN ] Snappy.SimpleTests
[ OK ] Snappy.SimpleTests (22 ms)
[ RUN ] Snappy.AppendSelfPatternExtensionEdgeCases
[ OK ] Snappy.AppendSelfPatternExtensionEdgeCases (3 ms)
[ RUN ] Snappy.AppendSelfPatternExtensionEdgeCasesExhaustive
[ OK ] Snappy.AppendSelfPatternExtensionEdgeCasesExhaustive (4830 ms)
[ RUN ] Snappy.MaxBlowup
[ OK ] Snappy.MaxBlowup (11 ms)
[ DISABLED ] Snappy.DISABLED_MoreThan4GB
[ RUN ] Snappy.RandomData
[ OK ] Snappy.RandomData (20311 ms)
[ RUN ] Snappy.FourByteOffset
[ OK ] Snappy.FourByteOffset (1 ms)
[ RUN ] Snappy.IOVecSourceEdgeCases
[ OK ] Snappy.IOVecSourceEdgeCases (0 ms)
[ RUN ] Snappy.IOVecSinkEdgeCases
[ OK ] Snappy.IOVecSinkEdgeCases (0 ms)
[ RUN ] Snappy.IOVecLiteralOverflow
[ OK ] Snappy.IOVecLiteralOverflow (0 ms)
[ RUN ] Snappy.IOVecCopyOverflow
[ OK ] Snappy.IOVecCopyOverflow (0 ms)
[ RUN ] Snappy.ReadPastEndOfBuffer
[ OK ] Snappy.ReadPastEndOfBuffer (0 ms)
[ RUN ] Snappy.ZeroOffsetCopy
[ OK ] Snappy.ZeroOffsetCopy (0 ms)
[ RUN ] Snappy.ZeroOffsetCopyValidation
[ OK ] Snappy.ZeroOffsetCopyValidation (0 ms)
[ RUN ] Snappy.FindMatchLength
[ OK ] Snappy.FindMatchLength (0 ms)
[ RUN ] Snappy.FindMatchLengthRandom
[ OK ] Snappy.FindMatchLengthRandom (1145 ms)
[ RUN ] Snappy.VerifyCharTable
[ OK ] Snappy.VerifyCharTable (0 ms)
[ RUN ] Snappy.TestBenchmarkFiles
[ OK ] Snappy.TestBenchmarkFiles (406 ms)
[----------] 17 tests from Snappy (26734 ms total)

[----------] 3 tests from SnappyCorruption
[ RUN ] SnappyCorruption.TruncatedVarint
[ OK ] SnappyCorruption.TruncatedVarint (0 ms)
[ RUN ] SnappyCorruption.UnterminatedVarint
[ OK ] SnappyCorruption.UnterminatedVarint (0 ms)
[ RUN ] SnappyCorruption.OverflowingVarint
[ OK ] SnappyCorruption.OverflowingVarint (0 ms)
[----------] 3 tests from SnappyCorruption (0 ms total)

[----------] Global test environment tear-down
[==========] 21 tests from 3 test suites ran. (26740 ms total)
[ PASSED ] 21 tests.

YOU HAVE 1 DISABLED TEST
[ PASSED ] 21 tests.

@anthony-zy
Copy link
Contributor Author

@danilak-G
I've revised the code, reducing branches and optimizing preprocessor directives for easier review. Could you check the current version when possible?
Also, in Snappy's maintenance mode, what's the review timeline and what changes are typically accepted? Thanks!

@anthony-zy
Copy link
Contributor Author

@danilak-G I've revised the code, reducing branches and optimizing preprocessor directives for easier review. Could you check the current version when possible? Also, in Snappy's maintenance mode, what's the review timeline and what changes are typically accepted? Thanks!

Hi @danilak-G,
Just a quick follow-up on this PR. I've made some revisions for easier review. Any chance you could take a look when you have a moment?
Thanks!"

@danilak-G
Copy link
Collaborator

Overall LGTM, thanks for fixing many things and now it looks greate, some small comments

snappy.cc Outdated
size_t remaining_bytes = size;
//Loop as long as there are bytes remaining to be copied.
while (remaining_bytes > 0) {
//Set vector configuration: e8 (8-bit elements), m2 (LMUL=2).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you fix the indentation to 4 spaces?

snappy.cc Outdated
// RVV acceleration available on RISC-V when compiled with -march=rv64gcv
#elif defined(__riscv) && SNAPPY_HAVE_RVV
// Cast pointers to the type we will operate on.
unsigned char* dst_ptr = (unsigned char*)dst;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use reinterpret_cast, please

snappy.cc Outdated
vuint8m2_t vec = VLE8_V_U8M2(src_ptr, vl);
//Store data to the current destination pointer.
VSE8_V_U8M2(dst_ptr, vec, vl);
//Update pointers and the remaining count.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set the space after // for all comments

snappy.cc Outdated
static_cast<const uint8_t*>(src) + kShortMemCopy,
64 - kShortMemCopy);
}
static_cast<const uint8_t*>(src) + kShortMemCopy,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra not-needed space added

@anthony-zy
Copy link
Contributor Author

Overall LGTM, thanks for fixing many things and now it looks greate, some small comments

Fixed, thanks for the review!

@anthony-zy anthony-zy requested a review from danilak-G October 17, 2025 09:02
std::memmove(dst + kShortMemCopy,
static_cast<const uint8_t*>(src) + kShortMemCopy,
64 - kShortMemCopy);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This closing bracket should not be deleted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure—bracket restored. Thanks for catching that!

snappy.cc Outdated
#else
std::memmove(dst, src, kShortMemCopy);
// Profiling shows that nearly all copies are short.
//Profiling shows that nearly all copies are short.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A space here as well

snappy.cc Outdated
size_t vl = VSETVL_E8M2(remaining_bytes);
// Load data from the current source pointer.
vuint8m2_t vec = VLE8_V_U8M2(src_ptr, vl);
// Store data to the current destination pointer.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Align comments with code

snappy.cc Outdated
// Suppress -Wignored-attributes warning for __m128i in x86 SSE2 environment
// warning: ignoring attributes on template argument 'snappy::internal::V128' {aka '__vector(2) long long int'} [-Wignored-attributes]
// This occurs because __m128i has vector attributes (e.g., __attribute__((vector_size(16)))) that are ignored in template parameters.
#ifdef __SSE2__
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is deleting LoadPatternAndReshuffleMask completely for non SSE2. SNAPPY_HAVE_VECTOR_BYTE_SHUFFLE should be correct, I guess

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a warning when compiling here on x86.I reverted to the previous code.
image

@danilak-G danilak-G merged commit 633cb9c into google:main Oct 20, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants