GH-48481: [Ruby] Correctly infer types for nested integer arrays #48699
+476
−44
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Rationale for this change
When building an
Arrow::Tablefrom a Ruby Hash passed toArrow::Table.new, nested Integer arrays are incorrectly inferred aslist<uint8>orlist<int8>regardless of the actual values contained. Nested integer arrays should be correctly inferred as the appropriate list type (e.g.,list<int64>,list<uint64>) based on their values, similar to how flat arrays are handled, unless they contain values out of range for any integer type.What changes are included in this PR?
This PR modifies the logic in
detect_builder_info()to fix the inference issue. Specifically:sub_builder_infoacross sub-array elements: Previously,sub_builder_infowas recreated for each sub-array element in the Array. The logic has been updated to accumulate and carry over the builder information across elements to ensure correct type inference for the entire list.BigDecimal, the logic for determining the Integer builder has been moved tocreate_builder().detect_builder_info()now calls this function.Note:
BigDecimal(which were previously inferred asstring) may now have their types inferred. However, comprehensive testing and verification for nestedBigDecimalsupport will be addressed in a separate issue to keep this PR focused.IntArrayBuilderfor inference logic to ensure correctness. This results in a performance overhead (array building is approximately 2x slower) as we can no longer rely on the specialized builder's detection.Are these changes tested?
Yes.
ruby ruby/red-arrow/test/run-test.rbAre there any user-facing changes?
Yes.
Github Issue: #48481