Full Comprehensive Comparison Tests by expanded-for-real · Pull Request #9 · imprint-serde/imprint-java

expanded-for-real · 2025-06-05T19:16:43Z

Satisfies most requirements in - #7. Note that I'm excluding Thrift for a later PR since we have enough data now to proceed. Also excluded Blackbird from Jackson for now but can add later.

…mance tracking; add comprehensive String benchmark

…to remove a bunch of stuff

…ing deserialization a bit faster

Try to enhance string deserialization

A full list of enhancements can be found here - #3

* Full comprehensive comparison tests with a lot of other libraries + some micro-optimizations added that were found along the way * replace deprecated gradle methods with latest --------- Co-authored-by: expand3d <>

# Conflicts: # src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java # src/main/java/com/imprint/core/ImprintRecord.java # src/main/java/com/imprint/types/TypeHandler.java # src/main/java/com/imprint/types/Value.java

expanded-for-real · 2025-06-05T19:17:54Z

.github/workflows/ci.yml

The CI file should trigger a Comprehensive benchmark run on merges to main and add them as a report

expanded-for-real · 2025-06-05T19:19:04Z

build.gradle

Lots of changes in the build.gradle to add other frameworks we're comparing against; includes protobuf and flatbuffers which requires major gradle work as demonstrated below

expanded-for-real · 2025-06-05T19:20:45Z

build.gradle

+    }
+}
+
+// Download and setup FlatBuffers compiler for Linux (CI environment)


We have to download a C based flatbuffer compiler to the CI server in order to run the comparisons during the merge process.

expanded-for-real · 2025-06-05T19:22:34Z

src/jmh/flatbuffers/test_record.fbs

flatbuffer schema file

expanded-for-real · 2025-06-05T19:22:55Z

src/jmh/proto/test_record.proto

protobuf schema file

expanded-for-real · 2025-06-05T19:23:10Z

src/main/java/com/imprint/core/ImprintRecord.java

    public ImprintRecord(Header header, List<DirectoryEntry> directory, ByteBuffer payload) {
        this.header = Objects.requireNonNull(header, "Header cannot be null");
-        this.directory = List.copyOf(Objects.requireNonNull(directory, "Directory cannot be null"));
+        this.directory = Collections.unmodifiableList(Objects.requireNonNull(directory, "Directory cannot be null"));


minor performance gain

…with nested objects

expanded-for-real · 2025-06-05T19:42:44Z

src/main/java/com/imprint/types/Value.java

        private final byte[] value;

        public BytesValue(byte[] value) {
-            this.value = value.clone(); // defensive copy


Removing debug comment notes to myself

expanded-for-real · 2025-06-05T19:43:54Z

src/main/java/com/imprint/types/Value.java

-        private volatile String cachedString; // lazy decode
+        private volatile String cachedString;
+
+        private static final int THREAD_LOCAL_BUFFER_SIZE = 1024;


Tried and true thread local buffer pool for Strings. It doesn't make a huge difference in micro benchmarks of ~5 iterations but the time saved over larger operations can be significant

expanded-for-real · 2025-06-05T19:48:07Z

src/main/java/com/imprint/types/TypeHandler.java

    Value deserialize(ByteBuffer buffer) throws ImprintException;
    void serialize(Value value, ByteBuffer buffer) throws ImprintException;
    int estimateSize(Value value) throws ImprintException;
-    ByteBuffer readValueBytes(ByteBuffer buffer) throws ImprintException;


@agavra I looked at the Rust implementation again you were correct, the previous version for readValueBytes was a bit wonky. I had added this originally since I wanted each type to define it but it's not needed anymore since we'll let the ImprintRecord set these boundaries through the Directory, even for nested values which had originally been causing me trouble. So pretty much all of this can be removed from the interface and underlying

expanded-for-real · 2025-06-05T19:49:16Z

src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java

Lot of test setup here between all the frameworks

expanded-for-real · 2025-06-05T19:51:54Z

src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java

+        bh.consume(result);
+    }
+
+    // ===== DESERIALIZATION BENCHMARKS =====


Imprint and Flatbuffers are kind of unfairly compared against the rest of the frameworks on this one since we're not actually deserializing anything and just setting it up, whereas Jackson for instance is deserializing straight to types. I plan on setting up additional fair tests in the next MR

Imprint was designed to have zero-overhead deserialization and intentionally sacrifices the cost to deserialize an entire record, since it's larger (which happens pretty rarely in data pipelines). For fairness we can (and should) publish the results for full deserialization, but it's important to keep in mind what we care about when it comes to optimizations.

expanded-for-real · 2025-06-05T19:52:42Z

src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java

+    }
+
    // ===== FIELD ACCESS BENCHMARKS =====
    // Tests accessing a single field near the end of a large record


These are more accurate, at least in the sense of achieving the same end goal even though Imprint and Flatbuffers are still doing a lot less work

expanded-for-real · 2025-06-05T19:53:07Z

src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java

+
    // ===== HELPER METHODS =====

+    private void setupAvro() {


String blocks aren't until Java 15.....

expanded-for-real · 2025-06-05T19:53:27Z

src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java

+        return messagePackMapper.writeValueAsBytes(data);
+    }
+
+    private byte[] serializeWithAvro(TestRecord data) throws Exception {


Avro setup is kind of annoying

expanded-for-real · 2025-06-05T19:53:42Z

src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java

+        return avroReader.read(null, decoder);
+    }
+
+    private byte[] serializeWithProtobuf(TestRecord data) {


Protobuf setup is more annoying

expanded-for-real · 2025-06-05T19:54:55Z

src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java

+        return builder.build().toByteArray();
+    }
+
+    private ByteBuffer serializeWithFlatBuffers(TestRecord data) {


I don't know if there's an easier way to do this but my God Flatbuffers is impossible to deal with

expanded-for-real · 2025-06-05T19:55:43Z

src/main/java/com/imprint/core/ImprintRecord.java

        }
+
+        //Single allocation instead of duplicate + slice
+        var fieldBuffer = payload.duplicate();


realized here on profiling that slice isn't actually needed.

…es in gradle file. Also fix permission issue

github-actions · 2025-06-05T20:05:47Z

Benchmark Results

Benchmark execution completed but no results file was found. Check the workflow logs for details.

…Comparison tests iterations manually

github-actions · 2025-06-05T20:26:49Z

📊 Benchmark Results

Benchmark execution completed but no results file was found. Check the workflow logs for details.

expanded-for-real · 2025-06-05T20:31:58Z

@agavra results here - #7. Sorry for tagging you in multiple places

agavra

This is really cool, thanks for running all these benchmarks!

At first I was concerned when seeing the results of the merge benchmark then I remembered we haven't implemented the merge algorithm in Java yet 😅 let's make sure to do that before we publish the benchmarks in docs anywhere.

I think the other important benchmark that isn't covered here is "project + serialize". Flatbuffers does a really good job if you're projecting fields, but if you need to project it into a smaller record schema you actually have to reserialize everything you're projecting out. (Basically imagine you have record with fields [id, name, company, email, age, ...] and you only care about [id, name, email] so you want to serialize just those three fields and send it to a downstream application.

Re: serialization is expected to be a little slower with Imprint since typically you aren't serializing the entire records within a pipeline, though I am surprised that it's so much slower. That would be worth looking into to see if we're missing anything.

Lastly, note that for the comparison benchmarks the larger the record the better Imprint fairs. So it's actually interesting to test merge/project with varying record sizes (both number of fields as well as size of the fields, in particular the latter).

agavra · 2025-06-05T20:37:54Z

.github/workflows/ci.yml

+      - name: Run size comparison benchmarks
+        run: |
+          ./gradlew jmhRunSizeComparisonBenchmarks
+        continue-on-error: true


I don't think we should run the comparison benchmarks on CI - they can be run on demand since we don't expect the performance of other systems to change when we commit to Imprint and the Imprint-specific benchmarks should catch regressions (which is the point of running things on each PR).

Not only does this slow down the PR builds, but I think there's some limit to the free GH plan on how many minutes you can run workflows for (though I need to double check that, it may only apply to private repos)

This makes my life way easier lol. I can reduce it to just a gradle task for ease of use to run locally and remove all the complex custom tasking

agavra · 2025-06-05T20:40:40Z

.github/workflows/ci.yml

+              }
+
+              if (latestFile) {
+                console.log(`📊 Found benchmark results: ${latestFile}`);


I recommend having the minimal amount of scripting inside ci.yml and instead delegate it to a script that we can just run locally. See https://github.com/imprint-serde/imprint/blob/main/scripts/ci_bench.sh for an example, I generate the markdown within the script there and just call that script from the GHA workflow

agavra · 2025-06-05T20:45:43Z

src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java

+        bh.consume(result);
+    }
+
+    // ===== DESERIALIZATION BENCHMARKS =====


Imprint was designed to have zero-overhead deserialization and intentionally sacrifices the cost to deserialize an entire record, since it's larger (which happens pretty rarely in data pipelines). For fairness we can (and should) publish the results for full deserialization, but it's important to keep in mind what we care about when it comes to optimizations.

agavra · 2025-06-05T20:47:35Z

src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java

        var usedFieldIds = new HashSet<Integer>();
-
-        // Copy fields from first record (takes precedence)
+


if we're publishing any of these benchmarks we should make sure to actually implement the merge algorithm 😉 if we need to deserialize/reserialize the entire two records that defeats the purpose of imprint

I'm going to remove the merge comparisons for now. I'll create a task and start merge algo next and can always come back to them

expanded-for-real and others added 26 commits June 1, 2025 13:23

Update .gitignore

5cb33c9

initial commit for imprint-java

63313e4

initial commit for imprint-java

dd4fdbc

Add GitHub Actions CI workflow for automated testing

bce1d13

Merge remote-tracking branch 'origin/dev' into dev

f5d90b5

Update GitHub Actions workflow to use upload-artifact@v4

72c468f

Add Gradle wrapper validation to CI workflow

468d682

Fix gitignore to include gradle-wrapper.jar for CI

cf05b13

Force add gradle-wrapper.jar to repository

d0d7983

Update wrapper validation action to v3

f2cdd1b

Fix Javadoc syntax errors and disable strict Javadoc checking

57c8249

Add JMH benchmark .bat and .sh for full suite benchmarking and perfor…

edb3057

…mance tracking; add comprehensive String benchmark

fix map serialization error in benchmark test and streamline ci file …

2853e3f

…to remove a bunch of stuff

Add execute permissions back for gradlew in CI

3a5a113

Add some more string based performance benchmarks and try to make str…

50a288b

…ing deserialization a bit faster

Merge pull request #2 from imprint-serde/faster-strings

ea1c4c4

Try to enhance string deserialization

second main commit to address initial commits

43cab28

A full list of enhancements can be found here - #3

additional cleanup to address concerns in #3

fdb8a56

minor style fixes

2e56688

minor style fixes again

9353388

minor style fixes on benchmark tests and supress unused

09d0377

minor reordering

6209bb1

Merge branch 'main' into dev

ace7c67

Merge Comparisons into dev branch (#8)

12d2823

* Full comprehensive comparison tests with a lot of other libraries + some micro-optimizations added that were found along the way * replace deprecated gradle methods with latest --------- Co-authored-by: expand3d <>

Merge remote-tracking branch 'origin/main' into dev

2834dbb

# Conflicts: # src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java # src/main/java/com/imprint/core/ImprintRecord.java # src/main/java/com/imprint/types/TypeHandler.java # src/main/java/com/imprint/types/Value.java

minor cleanup

83ed961

expanded-for-real requested a review from agavra June 5, 2025 19:16

expanded-for-real assigned agavra Jun 5, 2025

expanded-for-real commented Jun 5, 2025

View reviewed changes

minor cleanup

a605b65

expanded-for-real commented Jun 5, 2025

View reviewed changes

src/jmh/flatbuffers/test_record.fbs

Copy link

Collaborator Author

expanded-for-real Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flatbuffer schema file

expanded-for-real commented Jun 5, 2025

View reviewed changes

src/jmh/proto/test_record.proto

Copy link

Collaborator Author

expanded-for-real Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protobuf schema file

expanded-for-real commented Jun 5, 2025

View reviewed changes

expand3d added 2 commits June 5, 2025 15:23

minor cleanup

aacddeb

Actually fixes offsets and read Byte Values for Maps and Arrays even …

3bf81ad

…with nested objects

expanded-for-real commented Jun 5, 2025

View reviewed changes

src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java

Copy link

Collaborator Author

expanded-for-real Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lot of test setup here between all the frameworks

expanded-for-real commented Jun 5, 2025

View reviewed changes

change CI file to use JMH plugin to respect iteration and warmup valu…

7eaa6e9

…es in gradle file. Also fix permission issue

ok plugin didn't work apparently so reverting that and just reducing …

32640cd

…Comparison tests iterations manually

agavra approved these changes Jun 5, 2025

View reviewed changes

agavra mentioned this pull request Jun 5, 2025

Comprehensive Serialization Framework Comparison Tests #7

Closed

expanded-for-real merged commit dc6d2de into main Jun 5, 2025
10 checks passed

		var usedFieldIds = new HashSet<Integer>();

		// Copy fields from first record (takes precedence)

Conversation

expanded-for-real commented Jun 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jun 5, 2025

Benchmark Results

Uh oh!

github-actions bot commented Jun 5, 2025

📊 Benchmark Results

Uh oh!

expanded-for-real commented Jun 5, 2025

Uh oh!

agavra left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

expanded-for-real Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

expanded-for-real Jun 5, 2025 •

edited

Loading