Add optimized path for power-of-two-sized buffers #3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Some operations on the circular buffer make use of modulo arithmetic,
which was implemented (not surprisingly!) with the modulo operator
%.It turns out that the modulo operator is quite slow and therefore, on
tight loops, a substantial portion of the run time may be spent on it.
This commit adds a separate code path for buffers with power-of-two
sizes. For these, we can implement modulo arithmetic substantially more
efficiently using some bit manipulation.
This is not a breaking change: the public interface is exactly the same
as before. An optimized buffer is transparently created under the hood
whenever the requested size is a power-of-two.
The commit also adds benchmarks that perform various operations on
buffers that are and that are not power-of-two-sized. Here's the result
of the execution on my laptop:
I also tried this on some ARM devices (Pi 3B+, Pi Zero). While the exact numbers
vary, the overall picture is about the same.