Skip to content

Implement Cookie serialization format#3463

Draft
ttnghia wants to merge 20 commits intoNVIDIA:mainfrom
ttnghia:cookie_serializer
Draft

Implement Cookie serialization format#3463
ttnghia wants to merge 20 commits intoNVIDIA:mainfrom
ttnghia:cookie_serializer

Conversation

@ttnghia
Copy link
Collaborator

@ttnghia ttnghia commented Jun 17, 2025

This implements Cookie serialization format, a fast and efficient data serialization that targets efficiency for both data serialization/deserialization and disk IO. From the input as an array of host buffers given as byte arrays, the serializer simply compresses these byte arrays (using CPU thread pool) and assembles the compressed data along with other metadata into one output byte array for efficient disk IO. Deserialization is performed in the reversed way.

Contribute to NVIDIA/spark-rapids#12509.

ttnghia added 2 commits June 12, 2025 14:12
The exception handling in cudf jni is changed to prepare for support
capturing native stacktrace when exception being thrown. That is
breaking changes and this PR fixes it.

No new feature/implementation is added.

Depends on:
 * rapidsai/cudf#18983

This is part of [[Epic] Capture native stacktrace when throwing
exception using cpptrace
NVIDIA#3398](NVIDIA#3398).

---------

Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
@ttnghia ttnghia requested a review from revans2 June 17, 2025 20:48
@ttnghia ttnghia self-assigned this Jun 17, 2025
ttnghia added 18 commits June 17, 2025 13:54
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
# Conflicts:
#	thirdparty/cudf-pins/versions.json
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
@ttnghia ttnghia changed the base branch from branch-25.08 to branch-25.10 August 4, 2025 14:29
@ttnghia ttnghia changed the base branch from branch-25.10 to branch-25.12 September 29, 2025 17:41
@binmahone
Copy link
Collaborator

hi @ttnghia , quick question, why do we need a new format here?

@ttnghia
Copy link
Collaborator Author

ttnghia commented Oct 20, 2025

hi @ttnghia , quick question, why do we need a new format here?

We just want to dump data from memory to disk for later reading back. The data format needs to be just something that we can write/read in C++ quickly with minimal data conversion. I know that there are some external libraries for doing this, but we don't need much advanced functionalities thus a simple internal data format would be sufficient.

@nvauto
Copy link
Collaborator

nvauto commented Nov 17, 2025

NOTE: release/25.12 has been created from main. Please retarget your PR to release/25.12 if it should be included in the release.

@nvauto
Copy link
Collaborator

nvauto commented Jan 19, 2026

NOTE: release/26.02 has been created from main. Please retarget your PR to release/26.02 if it should be included in the release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants