Skip to content

[VL] Reduce triple map lookup to single find in dictionary serialization#11628

Merged
marin-ma merged 1 commit intoapache:mainfrom
acvictor:acvictor/optimizeDictSerializeLookup
Mar 2, 2026
Merged

[VL] Reduce triple map lookup to single find in dictionary serialization#11628
marin-ma merged 1 commit intoapache:mainfrom
acvictor:acvictor/optimizeDictSerializeLookup

Conversation

@acvictor
Copy link
Contributor

What changes are proposed in this pull request?

This PR replaces three separate map look ups on the same key with a single find() and iterator reuse in ArrowShuffleDictionaryWriter::serialize()

How was this patch tested?

Existing UTs

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the VELOX label Feb 19, 2026
@acvictor acvictor marked this pull request as ready for review February 19, 2026 16:01
@acvictor
Copy link
Contributor Author

@marin-ma can you please review this PR? Thanks in advance!

auto it = dictionaries_.find(fieldIdx);
GLUTEN_DCHECK(
dictionaries_.find(fieldIdx) != dictionaries_.end(),
it != dictionaries_.end(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GLUTEN_DCHECK means the code only executes on debug compilation mode, so we don't need to add complexity for it. Only if the code logic has something wrong, the check mai fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jinchengchenghh This change makes sense as dictionaries_.erase(fieldIdx) also performs another lookup.

Copy link
Contributor

@marin-ma marin-ma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Sorry for the late review!

@marin-ma marin-ma merged commit cfae8dc into apache:main Mar 2, 2026
57 of 58 checks passed
@acvictor acvictor deleted the acvictor/optimizeDictSerializeLookup branch March 3, 2026 05:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants