Add config option to preserve null values for collections #518
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The default behaviour for the Java driver is to convert null or empty values for the bytes associated with a collection to a Java type... see here for an example. This behaviour is implemented within the codec layer of the Java driver meaning that by the time the data reaches dsbulk it's already been converted... so dsbulk has no means to distinguish between an empty collection generated in this way and a legit empty collection.
This PR adds a config option which loads a custom codec for collection types. This custom codec simply returns an actual null value if null bytes (or empty bytes) are observed by the codec in the decode process. In all other cases the default behaviour of the codec is preserved.
I've included a unit test for this functionality as well but the following manual test should be enough to demonstrate the issue (and the results of this fix):