Skip to content
This repository was archived by the owner on Jan 26, 2026. It is now read-only.

Zero copy to main#19

Merged
expanded-for-real merged 57 commits intomainfrom
zero-copy-to-main
Jun 14, 2025
Merged

Zero copy to main#19
expanded-for-real merged 57 commits intomainfrom
zero-copy-to-main

Conversation

@expanded-for-real
Copy link
Collaborator

@expanded-for-real expanded-for-real commented Jun 12, 2025

Had to partially rebase and was messy - will clean up repo and prune branches after this PR.

Addresses these issues:
#17
#14
#12
#10

Zero Copy implementation led to a change in the underlying abstraction (removal of the ImprintBuffers class) and tried to create a stronger boundary between the consumer(deserializing/merge/projecting) side with the producing side (serializing). This also involved removing the ImprintWriter and just moving everything into the ImprintRecordBuilder as far as serialization is concerned. Current benchmark results are below. Merge is indeed faster than anything else but further optimizations will probably require Unsafe and some elaborate SWAR setup like Jackson and Netty have for UTF8 conversions to make serializing faster.

Serializing hotspots ()

  • sorting, but not nearly as much as you'd expect
  • UTF encoding on Strings - could borrow Jackson or Fury or Netty's SWAR UTF8 Writer
  • puts to ByteBuffer, probably because of boundary checking. I could write directly to the underlying array or just go for Unsafe
  • VarInt encoding (not really but it shows up in the flame graph)
  • a double ByteBuffer allocation I just realized I had and can eliminate (lol)

Finally - not shown are deserialize and field access. Imprint doesn't exactly have a "deserialize" so much as it does "access every field." I can add those in but we'll be slower (by design I believe). Field access is something only Flatbuffers really has and we're nearly on par with it (like 40 ns/ops for us vs 10 for flatbuffers). Everyone else is full deserializion cost

Benchmark Results

mergeAndSerialize

Framework Mode Cnt Score (ns/op) Error (ns/op)
Imprint avgt 7 1237.536 ±188.873
Jackson-JSON avgt 7 11744.677 ±2023.211
Protobuf avgt 7 7177.688 ±460.645
FlatBuffers avgt 7 1661.175 ±178.084
Avro-Generic avgt 7 11469.972 ±3506.027
Thrift avgt 7 5738.422 ±472.948
Kryo avgt 7 5118.342 ±841.488
MessagePack avgt 7 12403.641 ±995.149

projectAndSerialize

Framework Mode Cnt Score (ns/op) Error (ns/op)
Imprint avgt 7 547.601 ±38.770
Jackson-JSON avgt 7 5321.537 ±695.760
Protobuf avgt 7 2120.023 ±280.050
FlatBuffers avgt 7 316.252 ±33.684
Avro-Generic avgt 7 4680.088 ±859.520
Thrift avgt 7 1647.197 ±172.624
Kryo avgt 7 2486.407 ±214.266
MessagePack avgt 7 5541.148 ±758.440

serialize

Framework Mode Cnt Score (ns/op) Error (ns/op)
Imprint avgt 7 5435.692 ±447.959
Jackson-JSON avgt 7 2154.194 ±80.240
Protobuf avgt 7 2201.375 ±107.610
FlatBuffers avgt 7 1202.734 ±129.512
Avro-Generic avgt 7 2137.613 ±392.575
Thrift avgt 7 2921.587 ±535.176
Kryo avgt 7 2002.497 ±171.343
MessagePack avgt 7 3311.102 ±280.393

expanded-for-real and others added 30 commits June 1, 2025 13:23
…mance tracking; add comprehensive String benchmark
Try to enhance string deserialization
A full list of enhancements can be found here - #3
…ome micro-optimizations added that were found along the way
* Full comprehensive comparison tests with a lot of other libraries + some micro-optimizations added that were found along the way

* replace deprecated gradle methods with latest

---------

Co-authored-by: expand3d <>
# Conflicts:
#	src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java
#	src/main/java/com/imprint/core/ImprintRecord.java
#	src/main/java/com/imprint/types/TypeHandler.java
#	src/main/java/com/imprint/types/Value.java
expand3d added 9 commits June 9, 2025 00:51
Adds Apache Thrift to the benchmark suite, including self-contained compiler download. Corrects Protobuf and FlatBuffers schemas and fixes bugs in the competitor classes to ensure a stable and robust benchmark environment. Includes refactored DataGenerator.
# Conflicts:
#	src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java
#	src/main/java/com/imprint/core/ImprintBuffers.java
#	src/main/java/com/imprint/core/ImprintRecord.java
#	src/test/java/com/imprint/profile/ProfilerTest.java
Copy link
Collaborator Author

@expanded-for-real expanded-for-real Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New Comparison Benchmark - old one was getting excessive so I tried to make things more reasonable this way. I still need to really go through with a comb and make sure the comparisons are fairly testing each one since we're getting down in some micro times.

Added Thrift as well. Tried to add Cap n Proto since apparently it's absurdly fast and optimized version of protobuf or something but gave up in favor of making optimization changes for now. I don't think it would handle merge any differently than protobuf from what I understand?

private final TypeHandler handler;


private static final TypeCode[] LOOKUP = new TypeCode[11];
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

micro-optimization

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all zero-copy now and just do operations against incoming ByteBuffers, Result in benchmarks is undeniable and merge becomes ~20% better than fastbuffers due to this

} else {
Value.StringValue stringValue = (Value.StringValue) value;
byte[] utf8Bytes = stringValue.getUtf8Bytes();
return VarInt.encodedLength(utf8Bytes.length) + utf8Bytes.length;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

micro-optimization. Netty and Jackson use an elaborate SWAR process just for the sake of nearly zero-ing out the cost of UTF8 conversions. Not a huge concern right now though

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a bunch of integration tests along the way to make the buffer slicing (so much flipping, so much slicing) and new building pattern I created actually worked

* merging, and field projections
*/
@Value
class Entry implements Directory {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably isn't needed. After I clean up the github repo I want to do one last pass on all this. I created some interfaces to manage common data points I needed but ended up seperating concerns between reader/writer a lot better than I originally thought I would so some of the interfaces here, like the Directory one, might not be needed yet

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the description here says, I borrowed the IntToObject primitive map from EclipseCollections while trying to hyper optimize the serialize path. It doesn't make a huge difference on small single shot passes like in the comparison tests but actually amortizes really well. This reduced quite a few hotspots revealed by the ProfilerTests.

}
}

/**
Copy link
Collaborator Author

@expanded-for-real expanded-for-real Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows us to do left-side compaction of the map values through insertion sort in place and then return the array. It destroys the underlying map but eliminates all Array.copy() and new ArrayList<>() allocations and we once we sort we don't need the map anyways.

* - Sort values in place and return without allocation (subsequently poisons the map)
*/
final class ImprintFieldObjectMap<T> {
private static final int DEFAULT_CAPACITY = 512;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meant to drop this to 256 or maybe even 128 (depends on how many columns we expect in real world use cases). resizing is expensive and ProfileTests use 200 column wide objects so I push this to 512 to avoid resizing showing up in the flame graph

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted and moved to ops folder

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant and caused confusion about separation of boundaries between read and write. Can rename Builder class to be Writer for consistency between libraries though

}
}

// Task to download the Thrift compiler
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thrift was surprisingly fast but like the other binary formats it required quite a bit of gradle task work and it's own compiler

java {
srcDir 'build/generated/source/flatbuffers/jmh/java'
srcDir 'build/generated-src/thrift/jmh/java'
srcDir 'build/generated/sbe/java'
Copy link
Collaborator Author

@expanded-for-real expanded-for-real Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to add SBE but gave up. I can revisit it in a later issue though (and the XML based schema makes me regret ever not liking Avro's json based one). I'm actually curious how it would do on Merge and Project, especially compared to flatbuffers. I can't imagine it's that much better than flatbuffers but could be interesting.

@agavra
Copy link
Contributor

agavra commented Jun 13, 2025

This is really deep stuff @expanded-for-real - thank you! I've got a lot going on at work so I may not be able to get to this PR (it's pretty big) for a while. Feel free to merge it though if you have changes backed up behind this and I can always look at it after-the-fact.

@expanded-for-real
Copy link
Collaborator Author

@agavra I'll merge it in, I feel pretty confident everything is working functionally and the fact that merge and project are really fast is very promising.

I'm currently working on breaking down the TypeCode interface and the Value abstract class, and I'll probably make everything static and final. That should help a little bit, along with some better memory management tricks I can use. I think I can get serialization times down by a bit more before going full exotic

@expanded-for-real expanded-for-real merged commit 688f603 into main Jun 14, 2025
3 checks passed
@expanded-for-real expanded-for-real deleted the zero-copy-to-main branch June 14, 2025 17:19
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants