-
Notifications
You must be signed in to change notification settings - Fork 1k
feat(RISC-V): Add RVV-optimized implementation for memcopy64 #213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(RISC-V): Add RVV-optimized implementation for memcopy64 #213
Conversation
…peed For RISC-V[skip ci]
…into add_rvv_support
…into add_rvv_support
…into add_rvv_support
…into add_rvv_support
|
@danilak-G |
Hi @danilak-G, |
|
Overall LGTM, thanks for fixing many things and now it looks greate, some small comments |
snappy.cc
Outdated
| size_t remaining_bytes = size; | ||
| //Loop as long as there are bytes remaining to be copied. | ||
| while (remaining_bytes > 0) { | ||
| //Set vector configuration: e8 (8-bit elements), m2 (LMUL=2). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you fix the indentation to 4 spaces?
snappy.cc
Outdated
| // RVV acceleration available on RISC-V when compiled with -march=rv64gcv | ||
| #elif defined(__riscv) && SNAPPY_HAVE_RVV | ||
| // Cast pointers to the type we will operate on. | ||
| unsigned char* dst_ptr = (unsigned char*)dst; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use reinterpret_cast, please
snappy.cc
Outdated
| vuint8m2_t vec = VLE8_V_U8M2(src_ptr, vl); | ||
| //Store data to the current destination pointer. | ||
| VSE8_V_U8M2(dst_ptr, vec, vl); | ||
| //Update pointers and the remaining count. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set the space after // for all comments
snappy.cc
Outdated
| static_cast<const uint8_t*>(src) + kShortMemCopy, | ||
| 64 - kShortMemCopy); | ||
| } | ||
| static_cast<const uint8_t*>(src) + kShortMemCopy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra not-needed space added
…into add_rvv_support
…ny-zy/snappy into add_rvv_optmized_memcopy64
…ny-zy/snappy into add_rvv_optmized_memcopy64
Fixed, thanks for the review! |
| std::memmove(dst + kShortMemCopy, | ||
| static_cast<const uint8_t*>(src) + kShortMemCopy, | ||
| 64 - kShortMemCopy); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This closing bracket should not be deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure—bracket restored. Thanks for catching that!
snappy.cc
Outdated
| #else | ||
| std::memmove(dst, src, kShortMemCopy); | ||
| // Profiling shows that nearly all copies are short. | ||
| //Profiling shows that nearly all copies are short. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A space here as well
snappy.cc
Outdated
| size_t vl = VSETVL_E8M2(remaining_bytes); | ||
| // Load data from the current source pointer. | ||
| vuint8m2_t vec = VLE8_V_U8M2(src_ptr, vl); | ||
| // Store data to the current destination pointer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Align comments with code
…ny-zy/snappy into add_rvv_optmized_memcopy64
…ny-zy/snappy into add_rvv_optmized_memcopy64
snappy.cc
Outdated
| // Suppress -Wignored-attributes warning for __m128i in x86 SSE2 environment | ||
| // warning: ignoring attributes on template argument 'snappy::internal::V128' {aka '__vector(2) long long int'} [-Wignored-attributes] | ||
| // This occurs because __m128i has vector attributes (e.g., __attribute__((vector_size(16)))) that are ignored in template parameters. | ||
| #ifdef __SSE2__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is deleting LoadPatternAndReshuffleMask completely for non SSE2. SNAPPY_HAVE_VECTOR_BYTE_SHUFFLE should be correct, I guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…ny-zy/snappy into add_rvv_optmized_memcopy64
…ny-zy/snappy into add_rvv_optmized_memcopy64
…ny-zy/snappy into add_rvv_optmized_memcopy64
…ny-zy/snappy into add_rvv_optmized_memcopy64

This PR add RVV to optimize Memcopy64 function , improving compression speed by ~46%.
Optimize Snappy 1.2.2 performance
Added RVV support for RISC-V in Snappy, optimizing Memcopy64 by leveraging RVV vector load/store instructions (e.g., vle8_v_u8m2, vse8_v_u8m2) to reduce memory copy overhead and improve decompression performance. lzbench 2.1 tests onsilesia.tar (GCC 13.2.1, 64-bit Linux) show:Snappy 1.2.2 unittest ([ PASSED ] 21 tests.)
Running main() from gmock_main.cc
[==========] Running 21 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from CorruptedTest
[ RUN ] CorruptedTest.VerifyCorrupted
Crazy decompression lengths not checked on 64-bit build
[ OK ] CorruptedTest.VerifyCorrupted (5 ms)
[----------] 1 test from CorruptedTest (5 ms total)
[----------] 17 tests from Snappy
[ RUN ] Snappy.SimpleTests
[ OK ] Snappy.SimpleTests (22 ms)
[ RUN ] Snappy.AppendSelfPatternExtensionEdgeCases
[ OK ] Snappy.AppendSelfPatternExtensionEdgeCases (3 ms)
[ RUN ] Snappy.AppendSelfPatternExtensionEdgeCasesExhaustive
[ OK ] Snappy.AppendSelfPatternExtensionEdgeCasesExhaustive (4830 ms)
[ RUN ] Snappy.MaxBlowup
[ OK ] Snappy.MaxBlowup (11 ms)
[ DISABLED ] Snappy.DISABLED_MoreThan4GB
[ RUN ] Snappy.RandomData
[ OK ] Snappy.RandomData (20311 ms)
[ RUN ] Snappy.FourByteOffset
[ OK ] Snappy.FourByteOffset (1 ms)
[ RUN ] Snappy.IOVecSourceEdgeCases
[ OK ] Snappy.IOVecSourceEdgeCases (0 ms)
[ RUN ] Snappy.IOVecSinkEdgeCases
[ OK ] Snappy.IOVecSinkEdgeCases (0 ms)
[ RUN ] Snappy.IOVecLiteralOverflow
[ OK ] Snappy.IOVecLiteralOverflow (0 ms)
[ RUN ] Snappy.IOVecCopyOverflow
[ OK ] Snappy.IOVecCopyOverflow (0 ms)
[ RUN ] Snappy.ReadPastEndOfBuffer
[ OK ] Snappy.ReadPastEndOfBuffer (0 ms)
[ RUN ] Snappy.ZeroOffsetCopy
[ OK ] Snappy.ZeroOffsetCopy (0 ms)
[ RUN ] Snappy.ZeroOffsetCopyValidation
[ OK ] Snappy.ZeroOffsetCopyValidation (0 ms)
[ RUN ] Snappy.FindMatchLength
[ OK ] Snappy.FindMatchLength (0 ms)
[ RUN ] Snappy.FindMatchLengthRandom
[ OK ] Snappy.FindMatchLengthRandom (1145 ms)
[ RUN ] Snappy.VerifyCharTable
[ OK ] Snappy.VerifyCharTable (0 ms)
[ RUN ] Snappy.TestBenchmarkFiles
[ OK ] Snappy.TestBenchmarkFiles (406 ms)
[----------] 17 tests from Snappy (26734 ms total)
[----------] 3 tests from SnappyCorruption
[ RUN ] SnappyCorruption.TruncatedVarint
[ OK ] SnappyCorruption.TruncatedVarint (0 ms)
[ RUN ] SnappyCorruption.UnterminatedVarint
[ OK ] SnappyCorruption.UnterminatedVarint (0 ms)
[ RUN ] SnappyCorruption.OverflowingVarint
[ OK ] SnappyCorruption.OverflowingVarint (0 ms)
[----------] 3 tests from SnappyCorruption (0 ms total)
[----------] Global test environment tear-down
[==========] 21 tests from 3 test suites ran. (26740 ms total)
[ PASSED ] 21 tests.
YOU HAVE 1 DISABLED TEST
[ PASSED ] 21 tests.