Conversation
…mance tracking; add comprehensive String benchmark
…to remove a bunch of stuff
…ing deserialization a bit faster
Try to enhance string deserialization
A full list of enhancements can be found here - #3
…ome micro-optimizations added that were found along the way
* Full comprehensive comparison tests with a lot of other libraries + some micro-optimizations added that were found along the way * replace deprecated gradle methods with latest --------- Co-authored-by: expand3d <>
# Conflicts: # src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java # src/main/java/com/imprint/core/ImprintRecord.java # src/main/java/com/imprint/types/TypeHandler.java # src/main/java/com/imprint/types/Value.java
Adds Apache Thrift to the benchmark suite, including self-contained compiler download. Corrects Protobuf and FlatBuffers schemas and fixes bugs in the competitor classes to ensure a stable and robust benchmark environment. Includes refactored DataGenerator.
# Conflicts: # src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java # src/main/java/com/imprint/core/ImprintBuffers.java # src/main/java/com/imprint/core/ImprintRecord.java # src/test/java/com/imprint/profile/ProfilerTest.java
There was a problem hiding this comment.
New Comparison Benchmark - old one was getting excessive so I tried to make things more reasonable this way. I still need to really go through with a comb and make sure the comparisons are fairly testing each one since we're getting down in some micro times.
Added Thrift as well. Tried to add Cap n Proto since apparently it's absurdly fast and optimized version of protobuf or something but gave up in favor of making optimization changes for now. I don't think it would handle merge any differently than protobuf from what I understand?
| private final TypeHandler handler; | ||
|
|
||
|
|
||
| private static final TypeCode[] LOOKUP = new TypeCode[11]; |
There was a problem hiding this comment.
micro-optimization
There was a problem hiding this comment.
These are all zero-copy now and just do operations against incoming ByteBuffers, Result in benchmarks is undeniable and merge becomes ~20% better than fastbuffers due to this
| } else { | ||
| Value.StringValue stringValue = (Value.StringValue) value; | ||
| byte[] utf8Bytes = stringValue.getUtf8Bytes(); | ||
| return VarInt.encodedLength(utf8Bytes.length) + utf8Bytes.length; |
There was a problem hiding this comment.
micro-optimization. Netty and Jackson use an elaborate SWAR process just for the sake of nearly zero-ing out the cost of UTF8 conversions. Not a huge concern right now though
There was a problem hiding this comment.
Added a bunch of integration tests along the way to make the buffer slicing (so much flipping, so much slicing) and new building pattern I created actually worked
| * merging, and field projections | ||
| */ | ||
| @Value | ||
| class Entry implements Directory { |
There was a problem hiding this comment.
This probably isn't needed. After I clean up the github repo I want to do one last pass on all this. I created some interfaces to manage common data points I needed but ended up seperating concerns between reader/writer a lot better than I originally thought I would so some of the interfaces here, like the Directory one, might not be needed yet
There was a problem hiding this comment.
As the description here says, I borrowed the IntToObject primitive map from EclipseCollections while trying to hyper optimize the serialize path. It doesn't make a huge difference on small single shot passes like in the comparison tests but actually amortizes really well. This reduced quite a few hotspots revealed by the ProfilerTests.
| } | ||
| } | ||
|
|
||
| /** |
There was a problem hiding this comment.
This allows us to do left-side compaction of the map values through insertion sort in place and then return the array. It destroys the underlying map but eliminates all Array.copy() and new ArrayList<>() allocations and we once we sort we don't need the map anyways.
| * - Sort values in place and return without allocation (subsequently poisons the map) | ||
| */ | ||
| final class ImprintFieldObjectMap<T> { | ||
| private static final int DEFAULT_CAPACITY = 512; |
There was a problem hiding this comment.
meant to drop this to 256 or maybe even 128 (depends on how many columns we expect in real world use cases). resizing is expensive and ProfileTests use 200 column wide objects so I push this to 512 to avoid resizing showing up in the flame graph
There was a problem hiding this comment.
deleted and moved to ops folder
There was a problem hiding this comment.
Redundant and caused confusion about separation of boundaries between read and write. Can rename Builder class to be Writer for consistency between libraries though
| } | ||
| } | ||
|
|
||
| // Task to download the Thrift compiler |
There was a problem hiding this comment.
Thrift was surprisingly fast but like the other binary formats it required quite a bit of gradle task work and it's own compiler
| java { | ||
| srcDir 'build/generated/source/flatbuffers/jmh/java' | ||
| srcDir 'build/generated-src/thrift/jmh/java' | ||
| srcDir 'build/generated/sbe/java' |
There was a problem hiding this comment.
Tried to add SBE but gave up. I can revisit it in a later issue though (and the XML based schema makes me regret ever not liking Avro's json based one). I'm actually curious how it would do on Merge and Project, especially compared to flatbuffers. I can't imagine it's that much better than flatbuffers but could be interesting.
|
This is really deep stuff @expanded-for-real - thank you! I've got a lot going on at work so I may not be able to get to this PR (it's pretty big) for a while. Feel free to merge it though if you have changes backed up behind this and I can always look at it after-the-fact. |
|
@agavra I'll merge it in, I feel pretty confident everything is working functionally and the fact that merge and project are really fast is very promising. I'm currently working on breaking down the TypeCode interface and the Value abstract class, and I'll probably make everything static and final. That should help a little bit, along with some better memory management tricks I can use. I think I can get serialization times down by a bit more before going full exotic |
Had to partially rebase and was messy - will clean up repo and prune branches after this PR.
Addresses these issues:
#17
#14
#12
#10
Zero Copy implementation led to a change in the underlying abstraction (removal of the ImprintBuffers class) and tried to create a stronger boundary between the consumer(deserializing/merge/projecting) side with the producing side (serializing). This also involved removing the ImprintWriter and just moving everything into the ImprintRecordBuilder as far as serialization is concerned. Current benchmark results are below. Merge is indeed faster than anything else but further optimizations will probably require
Unsafeand some elaborate SWAR setup like Jackson and Netty have for UTF8 conversions to make serializing faster.Serializing hotspots ()
UnsafeFinally - not shown are deserialize and field access. Imprint doesn't exactly have a "deserialize" so much as it does "access every field." I can add those in but we'll be slower (by design I believe). Field access is something only Flatbuffers really has and we're nearly on par with it (like 40 ns/ops for us vs 10 for flatbuffers). Everyone else is full deserializion cost
Benchmark Results
mergeAndSerializeprojectAndSerializeserialize