Skip to content
This repository was archived by the owner on Jan 26, 2026. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
5cb33c9
Update .gitignore
expanded-for-real Jun 1, 2025
63313e4
initial commit for imprint-java
Jun 1, 2025
dd4fdbc
initial commit for imprint-java
expanded-for-real Jun 1, 2025
bce1d13
Add GitHub Actions CI workflow for automated testing
Jun 2, 2025
f5d90b5
Merge remote-tracking branch 'origin/dev' into dev
Jun 2, 2025
72c468f
Update GitHub Actions workflow to use upload-artifact@v4
Jun 2, 2025
468d682
Add Gradle wrapper validation to CI workflow
Jun 2, 2025
cf05b13
Fix gitignore to include gradle-wrapper.jar for CI
Jun 2, 2025
d0d7983
Force add gradle-wrapper.jar to repository
Jun 2, 2025
f2cdd1b
Update wrapper validation action to v3
Jun 2, 2025
57c8249
Fix Javadoc syntax errors and disable strict Javadoc checking
Jun 2, 2025
edb3057
Add JMH benchmark .bat and .sh for full suite benchmarking and perfor…
Jun 2, 2025
2853e3f
fix map serialization error in benchmark test and streamline ci file …
Jun 2, 2025
3a5a113
Add execute permissions back for gradlew in CI
Jun 2, 2025
50a288b
Add some more string based performance benchmarks and try to make str…
Jun 2, 2025
ea1c4c4
Merge pull request #2 from imprint-serde/faster-strings
expanded-for-real Jun 2, 2025
43cab28
second main commit to address initial commits
expanded-for-real Jun 3, 2025
fdb8a56
additional cleanup to address concerns in https://github.com/imprint-…
Jun 3, 2025
2e56688
minor style fixes
Jun 3, 2025
9353388
minor style fixes again
Jun 3, 2025
09d0377
minor style fixes on benchmark tests and supress unused
Jun 3, 2025
6209bb1
minor reordering
Jun 4, 2025
ace7c67
Merge branch 'main' into dev
Jun 4, 2025
4632e01
Full comprehensive comparison tests with a lot of other libraries + s…
Jun 5, 2025
3738861
replace deprecated gradle methods with latest
Jun 5, 2025
12d2823
Merge Comparisons into dev branch (#8)
expanded-for-real Jun 5, 2025
f7a6e8e
Lazy load of directory and header data
Jun 5, 2025
2834dbb
Merge remote-tracking branch 'origin/main' into dev
Jun 5, 2025
83ed961
minor cleanup
Jun 5, 2025
a605b65
minor cleanup
Jun 5, 2025
aacddeb
minor cleanup
Jun 5, 2025
3bf81ad
Actually fixes offsets and read Byte Values for Maps and Arrays even …
Jun 5, 2025
7eaa6e9
change CI file to use JMH plugin to respect iteration and warmup valu…
Jun 5, 2025
32640cd
ok plugin didn't work apparently so reverting that and just reducing …
Jun 5, 2025
2d882c2
Merge branch 'dev' into lazy-directory
Jun 5, 2025
880aeb0
trying to update github ci to make jmh actually work correctly
Jun 5, 2025
8831922
lazy directory deserialization
Jun 6, 2025
e361cf0
Merge branch 'main' into dev
Jun 6, 2025
73eade6
remove extra comments
Jun 6, 2025
02866d5
remove extra comments
Jun 6, 2025
6278665
Merge branch 'refs/heads/main' into dev
Jun 7, 2025
09443eb
Add merge and project APIs; optimize/simplify ImprintBuffers with Tre…
Jun 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update this later once we fix serialization slowness and add real merge/project comparison testing

Original file line number Diff line number Diff line change
Expand Up @@ -405,7 +405,7 @@ public void mergeFlatBuffers(Blackhole bh) {
// ===== MAIN METHOD TO RUN BENCHMARKS =====

public static void main(String[] args) throws RunnerException {
runAll();
runFieldAccessBenchmarks();
// Or, uncomment specific runner methods to execute subsets:
// runSerializationBenchmarks();
// runDeserializationBenchmarks();
Expand Down
270 changes: 62 additions & 208 deletions src/main/java/com/imprint/core/ImprintBuffers.java
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed a bunch of my javadoc comments from before since the private methods changed. Will add back after I fix the serialization hotpath and settle on a optimal abstraction here

Large diffs are not rendered by default.

207 changes: 207 additions & 0 deletions src/main/java/com/imprint/core/ImprintOperations.java
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just kinda copied the Rust impl on this one. note that it isn't zero copy either here

Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
package com.imprint.core;

import com.imprint.error.ErrorType;
import com.imprint.error.ImprintException;
import lombok.Value;
import lombok.experimental.UtilityClass;

import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.util.*;

@UtilityClass
public class ImprintOperations {

/**
* Project a subset of fields from an Imprint record. Payload copying is proportional to projected data size.
*
* <p><strong>Algorithm:</strong></p>
* <ol>
* <li>Sort and deduplicate requested field IDs for efficient matching</li>
* <li>Scan directory to find matching fields and calculate ranges</li>
* <li>Allocate new payload buffer with exact size needed</li>
* <li>Copy field data ranges directly (zero-copy where possible)</li>
* <li>Build new directory with adjusted offsets</li>
* </ol>
*
* @param record The source record to project from
* @param fieldIds Array of field IDs to include in projection
* @return New ImprintRecord containing only the requested fields
*/
public static ImprintRecord project(ImprintRecord record, int... fieldIds) {
// Sort and deduplicate field IDs for efficient matching with sorted directory
int[] sortedFieldIds = Arrays.stream(fieldIds).distinct().sorted().toArray();
if (sortedFieldIds.length == 0)
return createEmptyRecord(record.getHeader().getSchemaId());

//eager fetch the entire directory (can this be lazy and just done per field?)
var sourceDirectory = record.getDirectory();
var newDirectory = new ArrayList<DirectoryEntry>(sortedFieldIds.length);
var ranges = new ArrayList<FieldRange>();

// Iterate through directory and compute ranges to copy
int fieldIdsIdx = 0;
int directoryIdx = 0;
int currentOffset = 0;

while (directoryIdx < sourceDirectory.size() && fieldIdsIdx < sortedFieldIds.length) {
var field = sourceDirectory.get(directoryIdx);
if (field.getId() == sortedFieldIds[fieldIdsIdx]) {
// Calculate field length using next field's offset
int nextOffset = (directoryIdx + 1 < sourceDirectory.size()) ?
sourceDirectory.get(directoryIdx + 1).getOffset() :
record.getBuffers().getPayload().limit();
int fieldLength = nextOffset - field.getOffset();

newDirectory.add(new DirectoryEntry(field.getId(), field.getTypeCode(), currentOffset));
ranges.add(new FieldRange(field.getOffset(), nextOffset));

currentOffset += fieldLength;
fieldIdsIdx++;
}
directoryIdx++;
}

// Build new payload from ranges
var newPayload = buildPayloadFromRanges(record.getBuffers().getPayload(), ranges);

// Create new header with updated payload size
// TODO: compute correct schema hash
var newHeader = new Header(record.getHeader().getFlags(),
new SchemaId(record.getHeader().getSchemaId().getFieldSpaceId(), 0xdeadbeef),
newPayload.remaining()
);

return new ImprintRecord(newHeader, newDirectory, newPayload);
}

/**
* Merge two Imprint records, combining their fields. Payload copying is proportional to total data size.
*
* <p><strong>Merge Strategy:</strong></p>
* <ul>
* <li>Fields are merged using sort-merge algorithm on directory entries</li>
* <li>For duplicate field IDs: first record's field takes precedence</li>
* <li>Payloads are concatenated with directory offsets adjusted</li>
* <li>Schema ID from first record is preserved</li>
* </ul>
* </p>
*
* @param first The first record (takes precedence for duplicate fields)
* @param second The second record to merge
* @return New ImprintRecord containing merged fields
* @throws ImprintException if merge fails due to incompatible records
*/
public static ImprintRecord merge(ImprintRecord first, ImprintRecord second) throws ImprintException {
var firstDir = first.getDirectory();
var secondDir = second.getDirectory();

// Pre-allocate for worst case (no overlapping fields)
var newDirectory = new ArrayList<DirectoryEntry>(firstDir.size() + secondDir.size());
var payloadChunks = new ArrayList<ByteBuffer>();

int firstIdx = 0;
int secondIdx = 0;
int currentOffset = 0;

while (firstIdx < firstDir.size() || secondIdx < secondDir.size()) {
DirectoryEntry currentEntry;
ByteBuffer currentPayload;

if (firstIdx < firstDir.size() &&
(secondIdx >= secondDir.size() || firstDir.get(firstIdx).getId() <= secondDir.get(secondIdx).getId())) {

// Take from first record
currentEntry = firstDir.get(firstIdx);

// Skip duplicate field in second record if present
if (secondIdx < secondDir.size() &&
firstDir.get(firstIdx).getId() == secondDir.get(secondIdx).getId()) {
secondIdx++;
}

currentPayload = first.getRawBytes(currentEntry.getId());
firstIdx++;
} else {
// Take from second record
currentEntry = secondDir.get(secondIdx);
currentPayload = second.getRawBytes(currentEntry.getId());
secondIdx++;
}

if (currentPayload == null)
throw new ImprintException(ErrorType.BUFFER_UNDERFLOW, "Failed to get raw bytes for field " + currentEntry.getId());

// Add adjusted directory entry
var newEntry = new DirectoryEntry(currentEntry.getId(), currentEntry.getTypeCode(), currentOffset);
newDirectory.add(newEntry);

// Collect payload chunk
payloadChunks.add(currentPayload.duplicate());
currentOffset += currentPayload.remaining();
}

// Build merged payload
var mergedPayload = buildPayloadFromChunks(payloadChunks);

// Create header preserving first record's schema ID
var newHeader = new Header(first.getHeader().getFlags(), first.getHeader().getSchemaId(), mergedPayload.remaining());
Comment on lines +147 to +148
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops that's actually a bug in the rust implementation - this would generate a new schema ID (given the current spec of schema id). I'm not totally sure how I want to use schema IDs and there's some work still to figure that out (you can watch imprint-serde/imprint#8 for when we start working on that).

For now let's just make this a TODO so we don't lose track and hardcode something like 0xdeadbeef so it's "obviously" wrong.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I saw that and wasn't sure but I am keeping a watch on imprint-serde/imprint#8 as well!


return new ImprintRecord(newHeader, newDirectory, mergedPayload);
}

/**
* Represents a range of bytes to copy from source payload.
*/
@Value
private static class FieldRange {
int start;
int end;

int length() {
return end - start;
}
}

/**
* Build a new payload buffer from field ranges in the source payload.
*/
private static ByteBuffer buildPayloadFromRanges(ByteBuffer sourcePayload, List<FieldRange> ranges) {
int totalSize = ranges.stream().mapToInt(FieldRange::length).sum();
var newPayload = ByteBuffer.allocate(totalSize);
newPayload.order(ByteOrder.LITTLE_ENDIAN);

for (var range : ranges) {
var sourceSlice = sourcePayload.duplicate();
sourceSlice.position(range.start).limit(range.end);
newPayload.put(sourceSlice);
}

newPayload.flip();
return newPayload;
}

/**
* Build a new payload buffer by concatenating chunks.
*/
private static ByteBuffer buildPayloadFromChunks(List<ByteBuffer> chunks) {
int totalSize = chunks.stream().mapToInt(ByteBuffer::remaining).sum();
var mergedPayload = ByteBuffer.allocate(totalSize);
mergedPayload.order(ByteOrder.LITTLE_ENDIAN);

for (var chunk : chunks) {
mergedPayload.put(chunk);
}

mergedPayload.flip();
return mergedPayload;
}

/**
* Create an empty record with the given schema ID.
*/
private static ImprintRecord createEmptyRecord(SchemaId schemaId) {
var header = new Header(new Flags((byte) 0x01), schemaId, 0);
return new ImprintRecord(header, Collections.emptyList(), ByteBuffer.allocate(0));
}
}
22 changes: 22 additions & 0 deletions src/main/java/com/imprint/core/ImprintRecord.java
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,28 @@ public ByteBuffer getRawBytes(int fieldId) {
}
}

/**
* Project a subset of fields from this record.
*
* @param fieldIds Array of field IDs to include in the projection
* @return New ImprintRecord containing only the requested fields
*/
public ImprintRecord project(int... fieldIds) {
return ImprintOperations.project(this, fieldIds);
}

/**
* Merge another record into this one.
* For duplicate fields, this record's values take precedence.
*
* @param other The record to merge with this one
* @return New ImprintRecord containing merged fields
* @throws ImprintException if merge fails
*/
public ImprintRecord merge(ImprintRecord other) throws ImprintException {
return ImprintOperations.merge(this, other);
}

/**
* Get the directory (parsing it if necessary).
*/
Expand Down
20 changes: 12 additions & 8 deletions src/main/java/com/imprint/types/Value.java
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,7 @@ public String toString() {
}

// Float64 Value

@Getter
@EqualsAndHashCode(callSuper = false)
public static class Float64Value extends Value {
Expand All @@ -180,25 +181,28 @@ public Float64Value(double value) {

@Override
public TypeCode getTypeCode() { return TypeCode.FLOAT64; }

@Override
public String toString() {
return String.valueOf(value);
}
}

// Bytes Value (array-based)
@Getter
public static class BytesValue extends Value {
/**
* Returns internal array. MUST NOT be modified by caller.
*/
private final byte[] value;


/**
* Takes ownership of the byte array. Caller must not modify after construction.
*/
public BytesValue(byte[] value) {
this.value = value.clone();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor performance tweak

this.value = Objects.requireNonNull(value);
}

public byte[] getValue() {
return value.clone();
}


@Override
public TypeCode getTypeCode() { return TypeCode.BYTES; }

Expand Down
Loading