Skip to content
This repository was archived by the owner on Jan 26, 2026. It is now read-only.

Implement Lazy Directory Parsing for Improved Deserialization Performance #13

Merged
expanded-for-real merged 40 commits intomainfrom
dev
Jun 6, 2025
Merged

Implement Lazy Directory Parsing for Improved Deserialization Performance #13
expanded-for-real merged 40 commits intomainfrom
dev

Conversation

@expanded-for-real
Copy link
Collaborator

@expanded-for-real expanded-for-real commented Jun 6, 2025

#11

Implemented lazy directory de-serializing. Times were a bit scattered on the runs but consistently saw strong results, and I saw as low as 250 ns/ops at one point for Imprint. Either way we were consistently at least twice as fast as before so I think the additional complexity is worth it.

Before

Library Score (ns/op) Error Relative Performance
FlatBuffers 56.86 ±3.97 1.0x (baseline)
Imprint 1,109.83 ±105.04 19.5x slower

After

Serialization Format Average Time (ns/op) Std Deviation
FlatBuffers 106.512 ± 40.800
Imprint 527.664 ± 66.763

We should be in a good spot now to implement Merge and Project

expanded-for-real and others added 30 commits June 1, 2025 13:23
…mance tracking; add comprehensive String benchmark
Try to enhance string deserialization
A full list of enhancements can be found here - #3
…ome micro-optimizations added that were found along the way
* Full comprehensive comparison tests with a lot of other libraries + some micro-optimizations added that were found along the way

* replace deprecated gradle methods with latest

---------

Co-authored-by: expand3d <>
# Conflicts:
#	src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java
#	src/main/java/com/imprint/core/ImprintRecord.java
#	src/main/java/com/imprint/types/TypeHandler.java
#	src/main/java/com/imprint/types/Value.java
expand3d added 7 commits June 5, 2025 15:23
…es in gradle file. Also fix permission issue
# Conflicts:
#	.github/workflows/ci.yml
#	build.gradle
#	src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java
#	src/main/java/com/imprint/core/ImprintRecord.java
#	src/main/java/com/imprint/types/TypeHandler.java
#	src/main/java/com/imprint/types/Value.java
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Massively reduced the ci compared to last time - probably won't have to change this much for a while

# Conflicts:
#	.github/workflows/ci.yml
#	build.gradle
#	src/jmh/java/com/imprint/benchmark/ComparisonBenchmark.java
#	src/main/java/com/imprint/core/ImprintRecord.java
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Massively reduced gradle from last time

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache VarInt values. Minor performance tweak that I came up with when running comparisons and was low hanging fruit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put all the buffer handling logic in a new class to leave the ImprintRecord a bit cleaner (especially since I'm about to put merge and project in there too). I tried to comment it as best I could since the buffer handling logic can be confusing a took me a few tries before tests passed

Copy link
Contributor

@agavra agavra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skimmed the ImprintRecord changes but otherwise lgtm

Comment on lines +157 to +158
if (directoryCount >= 0)
return directoryCount;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor, but it seems like overkill to cache the directory count - reading a single varint is pretty speedy

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I over-engineered that for the sake of avoiding directory hits but it ends up just being unnecessary branching here. Will remove in the next PR

private final ByteBuffer payload; // Read-only payload view

// Lazy-loaded directory state
private List<DirectoryEntry> parsedDirectory;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you prefer a list to a map (idx -> entry) for efficient lookup when parsed? that way we can also lazily fill it in if we wanted to on every lookup (in addition to a single call to eagerly parse)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! The reason is because I didn't think of that lol. I think I mostly wanted to preserve insertion order on these.... but they're already sorted coming in anyways so that's a good idea. I'll implement the map on next PR

return -1;
int low = 0;
int high = parsedDirectory.size() - 1;
while (low <= high) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we use a map this is just an O(1) lookup into the parsed dir

Comment on lines +342 to +343
if (fieldId > currentFieldId)
return offset;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we binsearch here instead of go through field by field?

@expanded-for-real expanded-for-real merged commit e1b6dcf into main Jun 6, 2025
3 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants