Skip to content

Conversation

@absurdfarce
Copy link
Collaborator

@absurdfarce absurdfarce commented Dec 21, 2025

The default behaviour for the Java driver is to convert null or empty values for the bytes associated with a collection to a Java type... see here for an example. This behaviour is implemented within the codec layer of the Java driver meaning that by the time the data reaches dsbulk it's already been converted... so dsbulk has no means to distinguish between an empty collection generated in this way and a legit empty collection.

This PR adds a config option which loads a custom codec for collection types. This custom codec simply returns an actual null value if null bytes (or empty bytes) are observed by the codec in the decode process. In all other cases the default behaviour of the codec is preserved.

I've included a unit test for this functionality as well but the following manual test should be enough to demonstrate the issue (and the results of this fix):

CREATE TABLE test.baz (
 i int PRIMARY KEY,
 j text,
 k map<ascii,ascii>,
 l frozen<map<ascii,ascii>>
 );

insert into test.baz (i,j,k,l) values (1,'one',{},{});
insert into test.baz (i,j,k,l) values (2,'two',null,null);
insert into test.baz (i,j,k,l) values (3,'three',{'3': 'three'},{'3': 'three'});
$ bin/dsbulk unload -k test -t baz -c json 2> /dev/null
{"i":1,"j":"one","k":{},"l":{}}
{"i":2,"j":"two","k":{},"l":{}}
{"i":3,"j":"three","k":{"3":"three"},"l":{"3":"three"}}
$ bin/dsbulk unload --dsbulk.codec.allowNullCollections=true -k test -t baz -c json 2> /dev/null
{"i":1,"j":"one","k":null,"l":{}}
{"i":2,"j":"two","k":null,"l":null}
{"i":3,"j":"three","k":{"3":"three"},"l":{"3":"three"}}

Seeing some strange results when adding test values manually via cqlsh.  Presume this is a Python driver
issue but that isn't especially relevant for this issue.
@absurdfarce absurdfarce linked an issue Dec 21, 2025 that may be closed by this pull request
@absurdfarce
Copy link
Collaborator Author

Note that there's an interesting question here about why this CQL:

insert into test.baz (i,j,k,l) values (1,'one',{},{});

results in this return value from dsbulk:

{"i":1,"j":"one","k":null,"l":{}}

That's pretty clearly wrong, but I don't think it's a dsbulk error. I'm adding these test values via cqlsh and I see the same results when I query the tables via cqlsh so I'm pretty sure there's a Python driver problem lurking there somewhere. Regardless it pretty clearly isn't a dsbulk issue.

@absurdfarce
Copy link
Collaborator Author

Ping @adutra for review on this one as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Frozen field is exporting as {}, instead of null

2 participants