Implement Cookie serialization format#3463
Draft
ttnghia wants to merge 20 commits intoNVIDIA:mainfrom
Draft
Conversation
The exception handling in cudf jni is changed to prepare for support capturing native stacktrace when exception being thrown. That is breaking changes and this PR fixes it. No new feature/implementation is added. Depends on: * rapidsai/cudf#18983 This is part of [[Epic] Capture native stacktrace when throwing exception using cpptrace NVIDIA#3398](NVIDIA#3398). --------- Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
# Conflicts: # thirdparty/cudf
# Conflicts: # thirdparty/cudf-pins/versions.json
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Collaborator
|
hi @ttnghia , quick question, why do we need a new format here? |
Collaborator
Author
We just want to dump data from memory to disk for later reading back. The data format needs to be just something that we can write/read in C++ quickly with minimal data conversion. I know that there are some external libraries for doing this, but we don't need much advanced functionalities thus a simple internal data format would be sufficient. |
Collaborator
|
NOTE: release/25.12 has been created from main. Please retarget your PR to release/25.12 if it should be included in the release. |
Collaborator
|
NOTE: release/26.02 has been created from main. Please retarget your PR to release/26.02 if it should be included in the release. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This implements
Cookieserialization format, a fast and efficient data serialization that targets efficiency for both data serialization/deserialization and disk IO. From the input as an array of host buffers given as byte arrays, the serializer simply compresses these byte arrays (using CPU thread pool) and assembles the compressed data along with other metadata into one output byte array for efficient disk IO. Deserialization is performed in the reversed way.Contribute to NVIDIA/spark-rapids#12509.