Skip to content

Conversation

@lmbarros
Copy link

@lmbarros lmbarros commented May 17, 2021

Some operations on the circular buffer make use of modulo arithmetic,
which was implemented (not surprisingly!) with the modulo operator %.
It turns out that the modulo operator is quite slow and therefore, on
tight loops, a substantial portion of the run time may be spent on it.

This commit adds a separate code path for buffers with power-of-two
sizes. For these, we can implement modulo arithmetic substantially more
efficiently using some bit manipulation.

This is not a breaking change: the public interface is exactly the same
as before. An optimized buffer is transparently created under the hood
whenever the requested size is a power-of-two.

The commit also adds benchmarks that perform various operations on
buffers that are and that are not power-of-two-sized. Here's the result
of the execution on my laptop:

$go test -bench .

goos: linux
goarch: amd64
pkg: github.com/balena-os/circbuf
cpu: Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
Benchmark_Write_1024_500-12        	   73714	     15954 ns/op	31340.06 MB/s
Benchmark_Write_1025_500-12        	   54277	     22072 ns/op	22653.20 MB/s
Benchmark_Write_1024_5000-12       	   67614	     17685 ns/op	282722.48 MB/s
Benchmark_Write_1025_5000-12       	   50191	     23693 ns/op	211029.91 MB/s
Benchmark_Write_65536_5000-12      	   14248	     83924 ns/op	59577.56 MB/s
Benchmark_Write_65537_5000-12      	   13818	     86788 ns/op	57611.47 MB/s
Benchmark_Write_1024_5-12          	  170706	      6333 ns/op	 789.51 MB/s
Benchmark_Write_1025_5-12          	   80103	     14949 ns/op	 334.48 MB/s
Benchmark_WriteByte_1024-12        	  410422	      2806 ns/op	 356.44 MB/s
Benchmark_WriteByte_1025-12        	  110097	     10783 ns/op	  92.74 MB/s
Benchmark_WriteByte_65536-12       	  432852	      2831 ns/op	 353.25 MB/s
Benchmark_WriteByte_65537-12       	  108091	     10526 ns/op	  95.01 MB/s
Benchmark_Get_HalfFull_1024-12     	     920	   1291666 ns/op	   0.77 MB/s
Benchmark_Get_HalfFull_1025-12     	     908	   1310977 ns/op	   0.76 MB/s
Benchmark_Get_Full_1024-12         	     460	   2574138 ns/op	   0.39 MB/s
Benchmark_Get_Full_1025-12         	     458	   2629744 ns/op	   0.38 MB/s
Benchmark_Get_TwiceFull_1024-12    	     444	   2696897 ns/op	   0.37 MB/s
Benchmark_Get_TwiceFull_1025-12    	     132	   8926386 ns/op	   0.11 MB/s

I also tried this on some ARM devices (Pi 3B+, Pi Zero). While the exact numbers
vary, the overall picture is about the same.

Some operations on the circular buffer make use of modulo arithmetic,
which was implemented (not surprisingly!) with the modulo operator `%`.
It turns out that the modulo operator is quite slow and therefore, on
tight loops, a substantial portion of the run time may be spent on it.

This commit adds a separate code path for buffers with power-of-two
sizes. For these, we can implement modulo arithmetic substantially more
efficiently using some bit manipulation.

This is not a breaking change: the public interface is exactly the same
as before. An optimized buffer is transparently created under the hood
whenever the requested size is a power-of-two.

The commit also adds benchmarks that perform various operations on
buffers that are and that are not power-of-two-sized.

Change-type: minor
Signed-off-by: Leandro Motta Barros <leandro@balena.io>
@lmbarros lmbarros requested a review from robertgzr May 17, 2021 18:20
@lmbarros lmbarros self-assigned this May 17, 2021
@lmbarros
Copy link
Author

FWIW, I also opened a PR to try to get all our optimizations upstream: armon#5

Copy link

@robertgzr robertgzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice! you will need to add a repo.yml file like this: https://github.com/balena-os/balenaos-in-container/blob/master/repo.yml
to make versionbot happy

Signed-off-by: Leandro Motta Barros <leandro@balena.io>
@lmbarros lmbarros merged commit 5dbd4ee into master May 28, 2021
@Page- Page- deleted the lmbarros/power-of-two-optimization branch November 29, 2022 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants