-
Notifications
You must be signed in to change notification settings - Fork 378
Speedup C encoder up to 100x #256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
Apple M1 Pro
* Result for M1 Pro was fixed, since previous results was affected by the bug. |
|
@DagAgren Are you interested in this improvements? |
|
I also improved decoder performance about 14 times using the same techniques: caching cos values, linearTosRGB values and unrolling loops. This improves performance of decoding from 6 Mpx/s to 86 Mpx/s on M1. This also introduces very minor change in output result. Nothing that could be noticed by human eye, just different binary output. The method which I use to measure performance is following: diff --git forkSrcPrefix/C/encode_stb.c forkDstPrefix/C/encode_stb.c
index 811ca00006b45eaa829bfd267904ac0d0c647884..a95c6a2ff96ee7cdaa9d1b35ef28b063161cf01d 100644
--- forkSrcPrefix/C/encode_stb.c
+++ forkDstPrefix/C/encode_stb.c
@@ -4,6 +4,7 @@
#include "stb_image.h"
#include <stdio.h>
+#include <time.h>
const char *blurHashForFile(int xComponents, int yComponents,const char *filename);
@@ -38,6 +39,14 @@ const char *blurHashForFile(int xComponents, int yComponents,const char *filenam
const char *hash = blurHashForPixels(xComponents, yComponents, width, height, data, width * 3);
+ #define TIMES 30
+ clock_t start = clock();
+ for (int i = 0; i < TIMES; i++) {
+ hash = blurHashForPixels(xComponents, yComponents, width, height, data, width * 3);
+ }
+ double time_ms = (double)(clock() - start) / CLOCKS_PER_SEC / TIMES;
+ printf("Time per %d execution: %.3f ms\n", TIMES, time_ms * 1000);
+
stbi_image_free(data);
return hash;
diff --git forkSrcPrefix/C/decode_stb.c forkDstPrefix/C/decode_stb.c
index dab164e1eaf1a7199a751a5e13f6da7099027bd2..3514f53e6f91dc41253429ea07e594893d536598 100644
--- forkSrcPrefix/C/decode_stb.c
+++ forkDstPrefix/C/decode_stb.c
@@ -3,6 +3,8 @@
#define STB_IMAGE_WRITE_IMPLEMENTATION
#include "stb_writer.h"
+#include <time.h>
+
int main(int argc, char **argv) {
if(argc < 5) {
fprintf(stderr, "Usage: %s hash width height output_file [punch]\n", argv[0]);
@@ -34,6 +36,15 @@ int main(int argc, char **argv) {
freePixelArray(bytes);
+ #define TIMES 30
+ clock_t start = clock();
+ for (int i = 0; i < TIMES; i++) {
+ uint8_t * tmpbytes = decode(hash, width, height, punch, nChannels);
+ freePixelArray(tmpbytes);
+ }
+ double time_ms = (double)(clock() - start) / CLOCKS_PER_SEC / TIMES;
+ printf("Time per %d execution: %.3f ms\n", TIMES, time_ms * 1000);
+
fprintf(stdout, "Decoded blurhash successfully, wrote PNG file %s\n", output_file);
return 0;
} |
|
@DagAgren How can I earn your attention? |
|
@DagAgren please note that |
|
This is a breakthrough for this library. Why can't we merge it? @DagAgren ? |
|
Sorry I did not see this earlier. However, this code is written intentionally to be simple rather than performant, because it meant as a reference implementation that can be as easily ported as possible to other platforms. Also, it should not need high performance. You should not run it on a full-sized image, but instead first scale the image down to a much smaller size, such as 32x32, and run it on that. This is mentioned in the documentation. Running it on a full-scale image is not useful, as it throws away all that detail anyway. |
Does this mean you’re rejecting any performance improvements entirely, or only the more radical ones (like 4× loop unrolling)? Regarding the suggestion to scale the image down to 32×32 — that almost eliminates any benefit from sRGB → linear conversion. Performance improvements are still measurable even at that size. I used large images only to better demonstrate the effect; the same applies to small ones. |
All changes are divided by independent commits, some of them are optional.
In addition to improving performance there are changes:
M_PIin sources, ensure it defined inmath.h.blurhash_encoderexecutable (in line withblurHashForPixelsfunction)Makefileto avoid heavyencode_stbrecompilation on each change.Benchmarks are in the comment.