make_decoder accepts borrowed DataType instead of owned#9270
make_decoder accepts borrowed DataType instead of owned#9270alamb merged 4 commits intoapache:mainfrom
Conversation
| } | ||
| let (data_type, nullable) = if self.is_field { | ||
| let field = &self.schema.fields[0]; | ||
| (Cow::Borrowed(field.data_type()), field.is_nullable()) |
There was a problem hiding this comment.
NOTE: This Cow::Borrowed is necessary to preserve pointer stability of field.data_type() that would otherwise have to be cloned.
|
(this PR needs to have some conflicts resolved) |
|
@alamb should be good now |
| // If this struct nullable, need to permit nullability in child array | ||
| // StructArrayDecoder::decode verifies that if the child is not nullable | ||
| // it doesn't contain any nulls not masked by its parent |
There was a problem hiding this comment.
#9271 merged too quickly... this comment was supposed to remain inside the .map call. So I'm restoring this code back to follow the original upstream approach, instead of opening a separate PR just for that.
|
@alamb -- Is it normal for MIRI to take O(hours)? It's been stuck here for a very long time: |
It does take a while. I haven't tracked its overall time recently -- maybe some new tests are overly long |
Which issue does this PR close?
Rationale for this change
Today's json decoder helper,
make_decoder, takes an owned data type whose components are cloned at every level during the recursive decoder initialization process. This breaks pointer stability of the resultingDataTypeinstances that a custom JSON decoder factory would see, vs. those of the schema it and the reader builder were initialized with.The lack of pointer stability prevents users from creating "path based" decoder factories, that are able to customize decoder behavior based not only on type, but also on the field's path in the schema. See the
PathBasedDecoderFactoryin arrow-json/tests/custom_decoder_tests.rs of #9259, for an example.What changes are included in this PR?
By passing
&DataTypeinstead, we change code like this:to this:
Result: Every call to
make_decoderreceives a reference to the actual original data type from the builder's input schema. The final decoderSelfis unchanged -- it already received a clone and continues to do so.NOTE: There is one additional clone of the top-level
DataType::Structwe create for normal (!is_field) builders. But that's a cheap arc clone of aFieldsmember.Are these changes tested?
Yes, existing unit tests validate the change.
Are there any user-facing changes?
No. All functions and data types involved are private -- the array decoders are marked
pubbut are defined in a private mod with no public re-export that would make them available outside the crate.