Conversation
|
tests fail due to |
PragmaTwice
left a comment
There was a problem hiding this comment.
Hi, thank you for your contribution.
However IMHO, we cannot prefer debug-mode performance than maintainability. Debug-mode performance can be intuitively bad since it may involves some debug-only runtime checks and NO compiler optimizations.
In CMake, Debug mode usually means -O0 -g. If you want debug information but also want some performance, you can use RelWithDebInfo mode instead of Release which usually means -O2 -g and enables lots of compiler optimizations (also you can try -O1 -g if you want more precise debug information).
| if constexpr (Mode::need_checks) { | ||
| if (!Mode::check_bytes_span(b, n)) { | ||
| return {}; | ||
| } | ||
| } |
There was a problem hiding this comment.
need_checks seems useless. check_bytes_span will be an empty function in unsafe mode.
And it should be quite easy for compilers to eliminate.
There was a problem hiding this comment.
And it should be quite easy for compilers to eliminate.
It can eliminate or can not. It depends on compiler, version, mode etc.
The code above performs that removal independently from compiler, especially it removes the empty call in debug mode. There can be a lot of them.
There was a problem hiding this comment.
Hmm I think another way is to put [[gnu::always_inline]]/[[msvc::forceinline]] into its declaration (check_bytes_span), so that it can always be inlined and eliminated, even in debug mode.
Since I'm a compiler engineer I can guarantee that every C++ compiler can do this (MSVC/Clang/GCC), because it's quite trivial.
| while(len) { | ||
| if (Mode::get_value_from_result(C::template decode<Mode>(b), decode_v)) { | ||
| std::tie(*std::inserter(con, con.end()), b) = std::move(decode_v); | ||
| --len; |
There was a problem hiding this comment.
Hmm in every iteration, we get a new b, and the bytes it forwards here may NOT be 1, so --len is not correct IIRC.
There was a problem hiding this comment.
Oh, yeah, indeed. It seems number of bytes not number of items :-/
That's true, however I want my code to be compiled in full debug mode without optimizations, and as the library is header-only it cannot be easily achieved. |
Hi,
I've noted terrible performance in debug mode. So, I started to dig what's the problem... and I found algorithmic no issues in your code. So, the performance degradation is probably related to templated code.
Any way, I did minor changed, which led to some performance boost.
Here is a benchmarks (in Release mode):
before:
After: