21586: perf(physical-plan): optimize byte view append by martin-augment · Pull Request #309 · martin-augment/datafusion

martin-augment · 2026-04-13T11:51:54Z

21586: To review by AI

coderabbitai · 2026-04-13T11:52:16Z

Walkthrough

This pull request introduces three distinct changes: two documentation updates define new constraints for AI agents conducting code reviews, specifically prohibiting linking to issues or pull requests and reviewing AI configuration files; one significant implementation change optimizes ByteView handling in DataFusion's aggregation engine by introducing vectorized append operations with deferred bulk copying for long values, replacing per-row processing loops, and includes comprehensive unit test coverage for the new behavior.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch pr-21586-2026-04-13-11-51-50

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

augmentcode · 2026-04-13T11:56:24Z

🤖 Augment PR Summary

Summary: Optimizes ByteViewGroupValueBuilder’s vectorized append path for byte/utf8 view group values to reduce per-row work when ingesting many rows.

Changes:

Introduces a pending-copy representation for long (non-inlined) byte views and a batched copy routine that can copy contiguous source slices in fewer operations.
Splits vectorized append into fast paths for “no nulls” vs “has nulls”, minimizing branching in the non-null case.
Updates long-view handling to rewrite stored views to point at the builder’s internal buffers after batch copying.
Adds unit tests covering subset/repeated-row appends, interaction with take_n, multiple batches, mid-batch flush behavior, and oversized values.

Technical Notes: The new batching logic attempts to coalesce adjacent long views from the same source buffer into a single copy into the builder’s in-progress block, while preserving correct view offsets/buffer indices.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. No suggestions at this time.

Comment augment review to trigger a new review at any time.

claude · 2026-04-13T11:56:35Z

Code review for this optimization PR. The overall approach is sound but I have a few observations. First, vectorized_append_non_null_views double-iterates rows: once to extend self.views with raw source views, then again to build the pending list. The vectorized_append_views_with_nulls method already does this in a single pass, and aligning the non-null path to that pattern would avoid the transient invalid state where long-string views temporarily hold stale buffer_index and offset from the source array. Second, batch_copy_long_views only batches entries that are contiguous in source offset (next_start == batch_source_end), so in typical GROUP BY workloads with arbitrary row indices this degrades to per-item copies - correct, but a doc comment would help. On test coverage: none of the new tests exercise an input StringViewArray backed by multiple source data buffers, which is the primary scenario this optimization targets; a test constructing such an input would close that gap. Similarly, there is no test for scattered non-contiguous long-string rows from the same source buffer (e.g. rows [3, 0, 2]), which is the common GROUP BY pattern and exercises the single-item batch offset calculation. A test for an empty rows slice would also document that contract. Minor: PendingByteViewCopy deriving Clone Copy is fine for a POD type. Summary: the implementation is correct and the flush/oversized/null paths are well covered. Main asks before merging are (1) a test with multiple source buffers in the input, (2) a test with scattered non-contiguous rows, and (3) consider the single-pass refactor in vectorized_append_non_null_views.

gemini-code-assist

Code Review

This pull request introduces vectorized appending for ByteViewGroupValueBuilder in the DataFusion physical plan, optimizing the handling of binary and UTF-8 views during aggregation. The changes include new methods for batching the copy of long views and several unit tests to verify correctness across various scenarios. Feedback suggests optimizing memory allocation for the pending vector in the vectorized append paths to avoid over-allocation when processing many short strings.

gemini-code-assist · 2026-04-13T11:57:23Z

datafusion/physical-plan/src/aggregates/group_values/multi_group_by/bytes_view.rs

+        let start_idx = self.views.len();
+        self.views.extend(rows.iter().map(|&row| source_views[row]));
+
+        let mut pending = Vec::with_capacity(rows.len());


The pending vector is allocated and its capacity is reserved to rows.len() in every call to vectorized_append_non_null_views. For large batches where most strings are short (<= 12 bytes), this results in significant over-allocation of memory that is only partially used. Consider using a smaller initial capacity or a reusable buffer to reduce allocation overhead.

value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! Most probably many of the rows won't be long (longer than 12 bytes), so the pending Vec won't use all of the pre-allocated slots. It would be better to allocate less entries, e.g. half of the rows' length, and let it resize on append if needed.

gemini-code-assist · 2026-04-13T11:57:23Z

datafusion/physical-plan/src/aggregates/group_values/multi_group_by/bytes_view.rs

+        rows: &[usize],
+    ) {
+        let source_views = array.views();
+        let mut pending = Vec::with_capacity(rows.len());


Similar to the non-null case, the pending vector here is allocated with rows.len() capacity. This can be wasteful if the number of 'long' views is small relative to the total number of rows. Reusing a buffer or using a more conservative initial capacity would be more efficient.

value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! Most probably many of the rows won't be long (longer than 12 bytes), so the pending Vec won't use all of the pre-allocated slots. It would be better to allocate less entries, e.g. half of the rows' length, and let it resize on append if needed.

coderabbitai

🧹 Nitpick comments (1)

datafusion/physical-plan/src/aggregates/group_values/multi_group_by/bytes_view.rs (1)
247-310: Well-designed batching strategy.

The contiguous-range detection and bulk copying is a good optimization. A few notes:

The flush-before-process pattern (lines 257-261) ensures buffer_index calculation is always correct

Line 264 correctly handles oversized values by using max(block_size, first_len)

The offset calculation (line 303-304) correctly computes: dest_start + relative_offset_within_batch

Consider adding a brief comment at the top of the method explaining the batching strategy (identify contiguous ranges from same source buffer, copy in bulk) for future maintainers.
,
📝 Optional: Add explanatory comment
     fn batch_copy_long_views(
         &mut self,
         source_buffers: &[Buffer],
         pending: &[PendingByteViewCopy],
     ) {
+        // Batch contiguous ranges from the same source buffer to minimize
+        // copy operations. For each batch, copy the entire range at once
+        // and update each view's buffer_index/offset to point to our buffers.
         let mut batch_start = 0;
         while batch_start < pending.len() {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@datafusion/physical-plan/src/aggregates/group_values/multi_group_by/bytes_view.rs`
around lines 247 - 310, Add a brief explanatory comment at the top of the
batch_copy_long_views method describing the batching strategy: that it scans
pending (PendingByteViewCopy) to find contiguous ranges from the same source
buffer, flushes in_progress when adding the first element would exceed
max_block_size, uses max(max_block_size, first_len) to allow oversized single
entries, copies contiguous bytes in bulk from source_buffers into in_progress,
and then updates views entries with the new buffer_index, adjusted offset
(dest_batch_start + relative offset within the batch) so future readers
understand why flush_in_progress(), the max_block_size handling, and the offset
arithmetic in batch_copy_long_views are necessary.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@datafusion/physical-plan/src/aggregates/group_values/multi_group_by/bytes_view.rs`:
- Around line 247-310: Add a brief explanatory comment at the top of the
batch_copy_long_views method describing the batching strategy: that it scans
pending (PendingByteViewCopy) to find contiguous ranges from the same source
buffer, flushes in_progress when adding the first element would exceed
max_block_size, uses max(max_block_size, first_len) to allow oversized single
entries, copies contiguous bytes in bulk from source_buffers into in_progress,
and then updates views entries with the new buffer_index, adjusted offset
(dest_batch_start + relative offset within the batch) so future readers
understand why flush_in_progress(), the max_block_size handling, and the offset
arithmetic in batch_copy_long_views are necessary.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6348ddf0-4d77-408e-95b0-697edc1e9af2

📥 Commits

Reviewing files that changed from the base of the PR and between 644be38 and 0854982.

📒 Files selected for processing (3)

.cursor/rules.md
AGENTS.md
datafusion/physical-plan/src/aggregates/group_values/multi_group_by/bytes_view.rs

kumarUjjawal and others added 2 commits April 13, 2026 16:20

perf(physical-plan): optimize byte view append

c3d4dfe

Add AI config files

0854982

augmentcode bot reviewed Apr 13, 2026

View reviewed changes

gemini-code-assist bot reviewed Apr 13, 2026

View reviewed changes

coderabbitai bot reviewed Apr 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

21586: perf(physical-plan): optimize byte view append#309

21586: perf(physical-plan): optimize byte view append#309
martin-augment wants to merge 2 commits intomainfrom
pr-21586-2026-04-13-11-51-50

martin-augment commented Apr 13, 2026

Uh oh!

coderabbitai bot commented Apr 13, 2026 •

edited

Loading

Uh oh!

augmentcode bot commented Apr 13, 2026

Uh oh!

augmentcode bot left a comment

Uh oh!

claude bot commented Apr 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 13, 2026

Uh oh!

martin-augment Apr 13, 2026

Uh oh!

gemini-code-assist bot Apr 13, 2026

Uh oh!

martin-augment Apr 13, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

martin-augment commented Apr 13, 2026

Uh oh!

coderabbitai bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

augmentcode bot commented Apr 13, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

claude bot commented Apr 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Apr 13, 2026 •

edited

Loading