Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Jan 9, 2026

Rationale for this change

Fixed-size list scalars with equal values but different offsets were hashing to different values.

What changes are included in this PR?

  • Added FIXED_SIZE_LIST case in ArrayHash to scale child offsets by list_size
  • Added C++ test in scalar_test.cc for nested fixed-size lists
  • Added Python test in test_scalars.py

Are these changes tested?

Unit tests were added, and also manually tested.

Are there any user-facing changes?

Yes.

import pyarrow as pa

inner = pa.list_(pa.int32(), 2)
outer = pa.list_(inner, 3)
g = pa.array([[[1, 2], [3, 4], [5, 6]], [[7, 8], [9, 10], [11, 12]]], type=outer)
h = pa.array([[[7, 8], [9, 10], [11, 12]]], type=outer)
# Comparing `[[7, 8], [9, 10], [11, 12]]`
print(g[1] == h[0])
print(hash(g[1]) == hash(h[0]))

Before:

True
False

After:

True
True

@github-actions
Copy link

github-actions bot commented Jan 9, 2026

⚠️ GitHub issue #35830 has been automatically assigned in GitHub to PR creator.

@github-actions
Copy link

github-actions bot commented Jan 9, 2026

⚠️ GitHub issue #35830 has been automatically assigned in GitHub to PR creator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant