Skip to content

Improve decode performance#53

Open
basiliscos wants to merge 1 commit intoPragmaTwice:masterfrom
basiliscos:improve-decode-performance
Open

Improve decode performance#53
basiliscos wants to merge 1 commit intoPragmaTwice:masterfrom
basiliscos:improve-decode-performance

Conversation

@basiliscos
Copy link
Contributor

Hi,

I've noted terrible performance in debug mode. So, I started to dig what's the problem... and I found algorithmic no issues in your code. So, the performance degradation is probably related to templated code.

Any way, I did minor changed, which led to some performance boost.

Here is a benchmarks (in Release mode):

before:

------------------------------------------------------------------
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
BM_protopuf_encode            17.2 ns         17.2 ns     40928792
BM_protopuf_safe_encode       29.7 ns         29.7 ns     22518719
BM_protobuf_encode            68.0 ns         68.0 ns     10340438
BM_protopuf_decode             188 ns          188 ns      3728154
BM_protopuf_safe_decode        217 ns          217 ns      3232510
BM_protobuf_decode             244 ns          244 ns      2906093

After:

------------------------------------------------------------------
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
BM_protopuf_encode            17.3 ns         17.3 ns     40937909
BM_protopuf_safe_encode       30.5 ns         30.5 ns     22250119
BM_protobuf_encode            69.3 ns         69.3 ns     10183935
BM_protopuf_decode             145 ns          145 ns      4840484
BM_protopuf_safe_decode        162 ns          162 ns      4311272
BM_protobuf_decode             243 ns          243 ns      2895603

@basiliscos
Copy link
Contributor Author

tests fail due to begin_diff removal. Should I remove it from tests too?

Copy link
Owner

@PragmaTwice PragmaTwice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thank you for your contribution.

However IMHO, we cannot prefer debug-mode performance than maintainability. Debug-mode performance can be intuitively bad since it may involves some debug-only runtime checks and NO compiler optimizations.

In CMake, Debug mode usually means -O0 -g. If you want debug information but also want some performance, you can use RelWithDebInfo mode instead of Release which usually means -O2 -g and enables lots of compiler optimizations (also you can try -O1 -g if you want more precise debug information).

Comment on lines +125 to 129
if constexpr (Mode::need_checks) {
if (!Mode::check_bytes_span(b, n)) {
return {};
}
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need_checks seems useless. check_bytes_span will be an empty function in unsafe mode.

And it should be quite easy for compilers to eliminate.

Copy link
Contributor Author

@basiliscos basiliscos Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it should be quite easy for compilers to eliminate.

It can eliminate or can not. It depends on compiler, version, mode etc.

The code above performs that removal independently from compiler, especially it removes the empty call in debug mode. There can be a lot of them.

Copy link
Owner

@PragmaTwice PragmaTwice Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think another way is to put [[gnu::always_inline]]/[[msvc::forceinline]] into its declaration (check_bytes_span), so that it can always be inlined and eliminated, even in debug mode.

Since I'm a compiler engineer I can guarantee that every C++ compiler can do this (MSVC/Clang/GCC), because it's quite trivial.

Comment on lines +85 to +88
while(len) {
if (Mode::get_value_from_result(C::template decode<Mode>(b), decode_v)) {
std::tie(*std::inserter(con, con.end()), b) = std::move(decode_v);
--len;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm in every iteration, we get a new b, and the bytes it forwards here may NOT be 1, so --len is not correct IIRC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah, indeed. It seems number of bytes not number of items :-/

@basiliscos
Copy link
Contributor Author

In CMake, Debug mode usually means -O0 -g. If you want debug information but also want some performance, you can use RelWithDebInfo mode instead of Release which usually means -O2 -g and enables lots of compiler optimizations (also you can try -O1 -g if you want more precise debug information).

That's true, however I want my code to be compiled in full debug mode without optimizations, and as the library is header-only it cannot be easily achieved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants