Add optimized path for power-of-two-sized buffers #3

lmbarros · 2021-05-17T18:13:12Z

Some operations on the circular buffer make use of modulo arithmetic,
which was implemented (not surprisingly!) with the modulo operator %.
It turns out that the modulo operator is quite slow and therefore, on
tight loops, a substantial portion of the run time may be spent on it.

This commit adds a separate code path for buffers with power-of-two
sizes. For these, we can implement modulo arithmetic substantially more
efficiently using some bit manipulation.

This is not a breaking change: the public interface is exactly the same
as before. An optimized buffer is transparently created under the hood
whenever the requested size is a power-of-two.

The commit also adds benchmarks that perform various operations on
buffers that are and that are not power-of-two-sized. Here's the result
of the execution on my laptop:

$go test -bench .

goos: linux
goarch: amd64
pkg: github.com/balena-os/circbuf
cpu: Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
Benchmark_Write_1024_500-12        	   73714	     15954 ns/op	31340.06 MB/s
Benchmark_Write_1025_500-12        	   54277	     22072 ns/op	22653.20 MB/s
Benchmark_Write_1024_5000-12       	   67614	     17685 ns/op	282722.48 MB/s
Benchmark_Write_1025_5000-12       	   50191	     23693 ns/op	211029.91 MB/s
Benchmark_Write_65536_5000-12      	   14248	     83924 ns/op	59577.56 MB/s
Benchmark_Write_65537_5000-12      	   13818	     86788 ns/op	57611.47 MB/s
Benchmark_Write_1024_5-12          	  170706	      6333 ns/op	 789.51 MB/s
Benchmark_Write_1025_5-12          	   80103	     14949 ns/op	 334.48 MB/s
Benchmark_WriteByte_1024-12        	  410422	      2806 ns/op	 356.44 MB/s
Benchmark_WriteByte_1025-12        	  110097	     10783 ns/op	  92.74 MB/s
Benchmark_WriteByte_65536-12       	  432852	      2831 ns/op	 353.25 MB/s
Benchmark_WriteByte_65537-12       	  108091	     10526 ns/op	  95.01 MB/s
Benchmark_Get_HalfFull_1024-12     	     920	   1291666 ns/op	   0.77 MB/s
Benchmark_Get_HalfFull_1025-12     	     908	   1310977 ns/op	   0.76 MB/s
Benchmark_Get_Full_1024-12         	     460	   2574138 ns/op	   0.39 MB/s
Benchmark_Get_Full_1025-12         	     458	   2629744 ns/op	   0.38 MB/s
Benchmark_Get_TwiceFull_1024-12    	     444	   2696897 ns/op	   0.37 MB/s
Benchmark_Get_TwiceFull_1025-12    	     132	   8926386 ns/op	   0.11 MB/s

I also tried this on some ARM devices (Pi 3B+, Pi Zero). While the exact numbers
vary, the overall picture is about the same.

Some operations on the circular buffer make use of modulo arithmetic, which was implemented (not surprisingly!) with the modulo operator `%`. It turns out that the modulo operator is quite slow and therefore, on tight loops, a substantial portion of the run time may be spent on it. This commit adds a separate code path for buffers with power-of-two sizes. For these, we can implement modulo arithmetic substantially more efficiently using some bit manipulation. This is not a breaking change: the public interface is exactly the same as before. An optimized buffer is transparently created under the hood whenever the requested size is a power-of-two. The commit also adds benchmarks that perform various operations on buffers that are and that are not power-of-two-sized. Change-type: minor Signed-off-by: Leandro Motta Barros <leandro@balena.io>

lmbarros · 2021-05-17T18:57:38Z

FWIW, I also opened a PR to try to get all our optimizations upstream: armon#5

robertgzr

very nice! you will need to add a repo.yml file like this: https://github.com/balena-os/balenaos-in-container/blob/master/repo.yml
to make versionbot happy

Signed-off-by: Leandro Motta Barros <leandro@balena.io>

lmbarros requested a review from robertgzr May 17, 2021 18:20

lmbarros self-assigned this May 17, 2021

robertgzr approved these changes May 27, 2021

View reviewed changes

Fix CI/versionbot integration

fa24cd3

Signed-off-by: Leandro Motta Barros <leandro@balena.io>

lmbarros merged commit 5dbd4ee into master May 28, 2021

Page- deleted the lmbarros/power-of-two-optimization branch November 29, 2022 14:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add optimized path for power-of-two-sized buffers #3

Add optimized path for power-of-two-sized buffers #3

Uh oh!

lmbarros commented May 17, 2021 •

edited

Loading

Uh oh!

lmbarros commented May 17, 2021

Uh oh!

robertgzr left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add optimized path for power-of-two-sized buffers #3

Add optimized path for power-of-two-sized buffers #3

Uh oh!

Conversation

lmbarros commented May 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lmbarros commented May 17, 2021

Uh oh!

robertgzr left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lmbarros commented May 17, 2021 •

edited

Loading