Zero copy to main by expanded-for-real · Pull Request #19 · imprint-serde/imprint-java

expanded-for-real · 2025-06-12T20:48:43Z

Had to partially rebase and was messy - will clean up repo and prune branches after this PR.

Addresses these issues:
#17
#14
#12
#10

Zero Copy implementation led to a change in the underlying abstraction (removal of the ImprintBuffers class) and tried to create a stronger boundary between the consumer(deserializing/merge/projecting) side with the producing side (serializing). This also involved removing the ImprintWriter and just moving everything into the ImprintRecordBuilder as far as serialization is concerned. Current benchmark results are below. Merge is indeed faster than anything else but further optimizations will probably require Unsafe and some elaborate SWAR setup like Jackson and Netty have for UTF8 conversions to make serializing faster.

Serializing hotspots ()

sorting, but not nearly as much as you'd expect
UTF encoding on Strings - could borrow Jackson or Fury or Netty's SWAR UTF8 Writer
puts to ByteBuffer, probably because of boundary checking. I could write directly to the underlying array or just go for Unsafe
VarInt encoding (not really but it shows up in the flame graph)
a double ByteBuffer allocation I just realized I had and can eliminate (lol)

Finally - not shown are deserialize and field access. Imprint doesn't exactly have a "deserialize" so much as it does "access every field." I can add those in but we'll be slower (by design I believe). Field access is something only Flatbuffers really has and we're nearly on par with it (like 40 ns/ops for us vs 10 for flatbuffers). Everyone else is full deserializion cost

Benchmark Results

`mergeAndSerialize`

Framework	Mode	Cnt	Score (ns/op)	Error (ns/op)
Imprint	avgt	7	1237.536	±188.873
Jackson-JSON	avgt	7	11744.677	±2023.211
Protobuf	avgt	7	7177.688	±460.645
FlatBuffers	avgt	7	1661.175	±178.084
Avro-Generic	avgt	7	11469.972	±3506.027
Thrift	avgt	7	5738.422	±472.948
Kryo	avgt	7	5118.342	±841.488
MessagePack	avgt	7	12403.641	±995.149

`projectAndSerialize`

Framework	Mode	Cnt	Score (ns/op)	Error (ns/op)
Imprint	avgt	7	547.601	±38.770
Jackson-JSON	avgt	7	5321.537	±695.760
Protobuf	avgt	7	2120.023	±280.050
FlatBuffers	avgt	7	316.252	±33.684
Avro-Generic	avgt	7	4680.088	±859.520
Thrift	avgt	7	1647.197	±172.624
Kryo	avgt	7	2486.407	±214.266
MessagePack	avgt	7	5541.148	±758.440

`serialize`

Framework	Mode	Cnt	Score (ns/op)	Error (ns/op)
Imprint	avgt	7	5435.692	±447.959
Jackson-JSON	avgt	7	2154.194	±80.240
Protobuf	avgt	7	2201.375	±107.610
FlatBuffers	avgt	7	1202.734	±129.512
Avro-Generic	avgt	7	2137.613	±392.575
Thrift	avgt	7	2921.587	±535.176
Kryo	avgt	7	2002.497	±171.343
MessagePack	avgt	7	3311.102	±280.393

…mance tracking; add comprehensive String benchmark

…to remove a bunch of stuff

…ing deserialization a bit faster

Try to enhance string deserialization

A full list of enhancements can be found here - #3

…ome micro-optimizations added that were found along the way

* Full comprehensive comparison tests with a lot of other libraries + some micro-optimizations added that were found along the way * replace deprecated gradle methods with latest --------- Co-authored-by: expand3d <>

# Conflicts: # src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java # src/main/java/com/imprint/core/ImprintRecord.java # src/main/java/com/imprint/types/TypeHandler.java # src/main/java/com/imprint/types/Value.java

Adds Apache Thrift to the benchmark suite, including self-contained compiler download. Corrects Protobuf and FlatBuffers schemas and fixes bugs in the competitor classes to ensure a stable and robust benchmark environment. Includes refactored DataGenerator.

…e more fair

# Conflicts: # src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java # src/main/java/com/imprint/core/ImprintBuffers.java # src/main/java/com/imprint/core/ImprintRecord.java # src/test/java/com/imprint/profile/ProfilerTest.java

expanded-for-real · 2025-06-12T20:50:12Z

src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java

New Comparison Benchmark - old one was getting excessive so I tried to make things more reasonable this way. I still need to really go through with a comb and make sure the comparisons are fairly testing each one since we're getting down in some micro times.

Added Thrift as well. Tried to add Cap n Proto since apparently it's absurdly fast and optimized version of protobuf or something but gave up in favor of making optimization changes for now. I don't think it would handle merge any differently than protobuf from what I understand?

expanded-for-real · 2025-06-12T20:50:29Z

src/main/java/com/imprint/types/TypeCode.java

    private final TypeHandler handler;
-
+
+    private static final TypeCode[] LOOKUP = new TypeCode[11];


micro-optimization

expanded-for-real · 2025-06-12T20:51:28Z

src/main/java/com/imprint/ops/ImprintOperations.java

These are all zero-copy now and just do operations against incoming ByteBuffers, Result in benchmarks is undeniable and merge becomes ~20% better than fastbuffers due to this

expanded-for-real · 2025-06-12T20:52:40Z

src/main/java/com/imprint/types/TypeHandler.java

            } else {
                Value.StringValue stringValue = (Value.StringValue) value;
-                byte[] utf8Bytes = stringValue.getUtf8Bytes();
-                return VarInt.encodedLength(utf8Bytes.length) + utf8Bytes.length;


micro-optimization. Netty and Jackson use an elaborate SWAR process just for the sake of nearly zero-ing out the cost of UTF8 conversions. Not a huge concern right now though

expanded-for-real · 2025-06-12T20:53:37Z

src/test/java/com/imprint/IntegrationTest.java

Added a bunch of integration tests along the way to make the buffer slicing (so much flipping, so much slicing) and new building pattern I created actually worked

expanded-for-real · 2025-06-12T20:55:17Z

src/main/java/com/imprint/core/Directory.java

+     * merging, and field projections
+     */
+    @Value
+    class Entry implements Directory {


This probably isn't needed. After I clean up the github repo I want to do one last pass on all this. I created some interfaces to manage common data points I needed but ended up seperating concerns between reader/writer a lot better than I originally thought I would so some of the interfaces here, like the Directory one, might not be needed yet

expanded-for-real · 2025-06-12T20:57:37Z

src/main/java/com/imprint/core/ImprintFieldObjectMap.java

As the description here says, I borrowed the IntToObject primitive map from EclipseCollections while trying to hyper optimize the serialize path. It doesn't make a huge difference on small single shot passes like in the comparison tests but actually amortizes really well. This reduced quite a few hotspots revealed by the ProfilerTests.

expanded-for-real · 2025-06-12T20:59:15Z

src/main/java/com/imprint/core/ImprintFieldObjectMap.java

+        }
+    }
+
+    /**


This allows us to do left-side compaction of the map values through insertion sort in place and then return the array. It destroys the underlying map but eliminates all Array.copy() and new ArrayList<>() allocations and we once we sort we don't need the map anyways.

expanded-for-real · 2025-06-12T21:00:45Z

src/main/java/com/imprint/core/ImprintFieldObjectMap.java

+ * - Sort values in place and return without allocation (subsequently poisons the map)
+ */
+final class ImprintFieldObjectMap<T> {
+    private static final int DEFAULT_CAPACITY = 512;


meant to drop this to 256 or maybe even 128 (depends on how many columns we expect in real world use cases). resizing is expensive and ProfileTests use 200 column wide objects so I push this to 512 to avoid resizing showing up in the flame graph

expanded-for-real · 2025-06-12T21:01:10Z

src/main/java/com/imprint/core/ImprintOperations.java

deleted and moved to ops folder

expanded-for-real · 2025-06-12T21:13:48Z

src/main/java/com/imprint/core/ImprintWriter.java

Redundant and caused confusion about separation of boundaries between read and write. Can rename Builder class to be Writer for consistency between libraries though

expanded-for-real · 2025-06-12T21:15:19Z

build.gradle

    }
 }

+// Task to download the Thrift compiler


Thrift was surprisingly fast but like the other binary formats it required quite a bit of gradle task work and it's own compiler

expanded-for-real · 2025-06-12T21:17:16Z

build.gradle

        java {
            srcDir 'build/generated/source/flatbuffers/jmh/java'
+            srcDir 'build/generated-src/thrift/jmh/java'
+            srcDir 'build/generated/sbe/java'


Tried to add SBE but gave up. I can revisit it in a later issue though (and the XML based schema makes me regret ever not liking Avro's json based one). I'm actually curious how it would do on Merge and Project, especially compared to flatbuffers. I can't imagine it's that much better than flatbuffers but could be interesting.

agavra · 2025-06-13T15:55:02Z

This is really deep stuff @expanded-for-real - thank you! I've got a lot going on at work so I may not be able to get to this PR (it's pretty big) for a while. Feel free to merge it though if you have changes backed up behind this and I can always look at it after-the-fact.

expanded-for-real · 2025-06-13T15:59:22Z

@agavra I'll merge it in, I feel pretty confident everything is working functionally and the fact that merge and project are really fast is very promising.

I'm currently working on breaking down the TypeCode interface and the Value abstract class, and I'll probably make everything static and final. That should help a little bit, along with some better memory management tricks I can use. I think I can get serialization times down by a bit more before going full exotic

expanded-for-real and others added 30 commits June 1, 2025 13:23

Update .gitignore

5cb33c9

initial commit for imprint-java

63313e4

initial commit for imprint-java

dd4fdbc

Add GitHub Actions CI workflow for automated testing

bce1d13

Merge remote-tracking branch 'origin/dev' into dev

f5d90b5

Update GitHub Actions workflow to use upload-artifact@v4

72c468f

Add Gradle wrapper validation to CI workflow

468d682

Fix gitignore to include gradle-wrapper.jar for CI

cf05b13

Force add gradle-wrapper.jar to repository

d0d7983

Update wrapper validation action to v3

f2cdd1b

Fix Javadoc syntax errors and disable strict Javadoc checking

57c8249

Add JMH benchmark .bat and .sh for full suite benchmarking and perfor…

edb3057

…mance tracking; add comprehensive String benchmark

fix map serialization error in benchmark test and streamline ci file …

2853e3f

…to remove a bunch of stuff

Add execute permissions back for gradlew in CI

3a5a113

Add some more string based performance benchmarks and try to make str…

50a288b

…ing deserialization a bit faster

Merge pull request #2 from imprint-serde/faster-strings

ea1c4c4

Try to enhance string deserialization

second main commit to address initial commits

43cab28

A full list of enhancements can be found here - #3

additional cleanup to address concerns in #3

fdb8a56

minor style fixes

2e56688

minor style fixes again

9353388

minor style fixes on benchmark tests and supress unused

09d0377

minor reordering

6209bb1

Merge branch 'main' into dev

ace7c67

Full comprehensive comparison tests with a lot of other libraries + s…

4632e01

…ome micro-optimizations added that were found along the way

replace deprecated gradle methods with latest

3738861

Merge Comparisons into dev branch (#8)

12d2823

* Full comprehensive comparison tests with a lot of other libraries + some micro-optimizations added that were found along the way * replace deprecated gradle methods with latest --------- Co-authored-by: expand3d <>

Lazy load of directory and header data

f7a6e8e

Merge remote-tracking branch 'origin/main' into dev

2834dbb

# Conflicts: # src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java # src/main/java/com/imprint/core/ImprintRecord.java # src/main/java/com/imprint/types/TypeHandler.java # src/main/java/com/imprint/types/Value.java

minor cleanup

83ed961

minor cleanup

a605b65

expand3d added 9 commits June 9, 2025 00:51

Add single-field access test

a722e45

correct benchmark methodology for fairness

9d0f2c8

micro-optiomize and attempt to make ComparisonBenchmark tests a littl…

4b2664c

…e more fair

final optimization and reorganization into better project structure

b4cf85d

final optimization and reorganization into better project structure

cce8994

Merge branch 'main' into zero-copy

f06ad98

# Conflicts: # src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java # src/main/java/com/imprint/core/ImprintBuffers.java # src/main/java/com/imprint/core/ImprintRecord.java # src/test/java/com/imprint/profile/ProfilerTest.java

track custom map

50c8a4b

delete extra operations file because I moved it

eb40310

expanded-for-real requested a review from agavra June 12, 2025 20:48

expanded-for-real assigned agavra Jun 12, 2025

expanded-for-real commented Jun 12, 2025

View reviewed changes

src/main/java/com/imprint/core/ImprintOperations.java

Copy link

Collaborator Author

expanded-for-real Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted and moved to ops folder

adding comments and TODOs

b8449c8

expanded-for-real commented Jun 12, 2025

View reviewed changes

agavra approved these changes Jun 13, 2025

View reviewed changes

expanded-for-real merged commit 688f603 into main Jun 14, 2025
3 checks passed

expanded-for-real deleted the zero-copy-to-main branch June 14, 2025 17:19

		private final TypeHandler handler;


		private static final TypeCode[] LOOKUP = new TypeCode[11];

+                      }
+                  }
+                  /**

Conversation

expanded-for-real commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Results

mergeAndSerialize

projectAndSerialize

serialize

Uh oh!

expanded-for-real Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

expanded-for-real Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

expanded-for-real Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agavra commented Jun 13, 2025

Uh oh!

expanded-for-real commented Jun 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

expanded-for-real commented Jun 12, 2025 •

edited

Loading

`mergeAndSerialize`

`projectAndSerialize`

`serialize`

expanded-for-real Jun 12, 2025 •

edited

Loading

expanded-for-real Jun 12, 2025 •

edited

Loading

expanded-for-real Jun 12, 2025 •

edited

Loading