You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey,
saw there was a .Net port of QOI and got curious about the new performance things that .net standard has. Here is what I came up with:
Encode:
Method
CurrentPath
Mean
Error
StdDev
Ratio
Gen 0
Gen 1
Gen 2
Allocated
OriginalEncode
Large.jpg
201,046.6 us
722.04 us
675.39 us
1.00
-
-
-
74,109 KB
OptimisedEncode
Large.jpg
139,210.1 us
332.80 us
295.02 us
0.69
-
-
-
18,885 KB
OriginalEncode
Medium.jpg
11,649.1 us
47.91 us
42.47 us
1.00
31.2500
31.2500
31.2500
4,385 KB
OptimisedEncode
Medium.jpg
7,930.6 us
73.48 us
65.14 us
0.68
-
-
-
1,749 KB
OriginalEncode
Small.jpg
1,054.2 us
1.55 us
1.29 us
1.00
1.9531
1.9531
1.9531
422 KB
OptimisedEncode
Small.jpg
724.7 us
3.00 us
2.81 us
0.69
-
-
-
129 KB
Decode:
Method
CurrentPath
Mean
Error
StdDev
Ratio
Gen 0
Gen 1
Gen 2
Allocated
OriginalDecode
Large.qoi
127,637.5 us
576.69 us
539.43 us
1.00
250.0000
250.0000
250.0000
55,285 KB
OptimisedDecode
Large.qoi
123,073.7 us
222.35 us
197.11 us
0.96
250.0000
250.0000
250.0000
55,284 KB
OriginalDecode
Medium.qoi
6,381.4 us
26.74 us
22.33 us
1.00
46.8750
46.8750
46.8750
2,637 KB
OptimisedDecode
Medium.qoi
5,763.2 us
41.81 us
39.11 us
0.90
70.3125
70.3125
70.3125
2,637 KB
OriginalDecode
Small.qoi
636.7 us
1.35 us
1.19 us
1.00
5.8594
5.8594
5.8594
293 KB
OptimisedDecode
Small.qoi
630.1 us
9.91 us
9.27 us
0.99
7.8125
7.8125
7.8125
293 KB
I think that potential users would quite like these changes as it's a nice speedup overall
Main changes
Minimize GC by using ArrayPool to use a temporary array when encoding/decoding so that nothing needs to be allocated and thrown away.
Use Span for readonly array access and it has a lower overhead (funnily enough not all places where there was an array had a benefit from Span)
Some aggressive in-lining means that small methods don't add a new stack frame and improves performance at the expense of assembly size (but the amount of usages in this case don't make a huge difference to size)
Minor changes
split the rgba and rgb code so that there are no redundant checks inside the loop which reduces branching (and cuts down on some CPU work). This does give about a 5-7% improvement but it does massively increase the size of the code and makes some duplicate code so I can understand not wanting this part of the change.
Using the two benchmarks in the PR I submitted ( #8 ):
Method
Mean
Error
StdDev
Gen 0
Gen 1
Gen 2
Allocated
QoiEncoding
5.887 ms
0.0168 ms
0.0149 ms
101.5625
101.5625
101.5625
729 KB
and
Method
Mean
Error
StdDev
Gen 0
Gen 1
Gen 2
Allocated
QoiDecoding
4.049 ms
0.0175 ms
0.0155 ms
203.1250
203.1250
203.1250
2 MB
The WIP perf improvements in my various PRs were combined into this: #7 (comment) and offer faster encoding and decoding as compared to this PR (for the specific image in the benchmark).
There's definitely some stuff worth integrating from both your PR and my PR. I was hesitant about splitting into a Decode3 and Decode4 method - however if the maintainer is comfortable with that we could investigate iterating over the byte[] using Marshal.Cast<byte, RGB> or Marshal.Cast<byte, RGBA> (where the structs are: struct RGB { public byte r, g, b; } and struct RGBA { public byte r, g, b, a; }) . There are some other tricks that could be investigated too :)
Yeah the rgb split is a little extreme but when I did it I got a good 10% improvement in decode which made me do it in encode and that was 5%.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hey,
saw there was a .Net port of QOI and got curious about the new performance things that .net standard has. Here is what I came up with:
Encode:
Decode:
I think that potential users would quite like these changes as it's a nice speedup overall
Main changes
ArrayPoolto use a temporary array when encoding/decoding so that nothing needs to be allocated and thrown away.Spanfor readonly array access and it has a lower overhead (funnily enough not all places where there was an array had a benefit from Span)Minor changes