21586: perf(physical-plan): optimize byte view append#309
21586: perf(physical-plan): optimize byte view append#309martin-augment wants to merge 2 commits intomainfrom
Conversation
WalkthroughThis pull request introduces three distinct changes: two documentation updates define new constraints for AI agents conducting code reviews, specifically prohibiting linking to issues or pull requests and reviewing AI configuration files; one significant implementation change optimizes ByteView handling in DataFusion's aggregation engine by introducing vectorized append operations with deferred bulk copying for long values, replacing per-row processing loops, and includes comprehensive unit test coverage for the new behavior. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🤖 Augment PR SummarySummary: Optimizes Changes:
Technical Notes: The new batching logic attempts to coalesce adjacent long views from the same source buffer into a single copy into the builder’s in-progress block, while preserving correct view offsets/buffer indices. 🤖 Was this summary useful? React with 👍 or 👎 |
|
Code review for this optimization PR. The overall approach is sound but I have a few observations. First, vectorized_append_non_null_views double-iterates rows: once to extend self.views with raw source views, then again to build the pending list. The vectorized_append_views_with_nulls method already does this in a single pass, and aligning the non-null path to that pattern would avoid the transient invalid state where long-string views temporarily hold stale buffer_index and offset from the source array. Second, batch_copy_long_views only batches entries that are contiguous in source offset (next_start == batch_source_end), so in typical GROUP BY workloads with arbitrary row indices this degrades to per-item copies - correct, but a doc comment would help. On test coverage: none of the new tests exercise an input StringViewArray backed by multiple source data buffers, which is the primary scenario this optimization targets; a test constructing such an input would close that gap. Similarly, there is no test for scattered non-contiguous long-string rows from the same source buffer (e.g. rows [3, 0, 2]), which is the common GROUP BY pattern and exercises the single-item batch offset calculation. A test for an empty rows slice would also document that contract. Minor: PendingByteViewCopy deriving Clone Copy is fine for a POD type. Summary: the implementation is correct and the flush/oversized/null paths are well covered. Main asks before merging are (1) a test with multiple source buffers in the input, (2) a test with scattered non-contiguous rows, and (3) consider the single-pass refactor in vectorized_append_non_null_views. |
There was a problem hiding this comment.
Code Review
This pull request introduces vectorized appending for ByteViewGroupValueBuilder in the DataFusion physical plan, optimizing the handling of binary and UTF-8 views during aggregation. The changes include new methods for batching the copy of long views and several unit tests to verify correctness across various scenarios. Feedback suggests optimizing memory allocation for the pending vector in the vectorized append paths to avoid over-allocation when processing many short strings.
| let start_idx = self.views.len(); | ||
| self.views.extend(rows.iter().map(|&row| source_views[row])); | ||
|
|
||
| let mut pending = Vec::with_capacity(rows.len()); |
There was a problem hiding this comment.
The pending vector is allocated and its capacity is reserved to rows.len() in every call to vectorized_append_non_null_views. For large batches where most strings are short (<= 12 bytes), this results in significant over-allocation of memory that is only partially used. Consider using a smaller initial capacity or a reusable buffer to reduce allocation overhead.
There was a problem hiding this comment.
value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! Most probably many of the rows won't be long (longer than 12 bytes), so the pending Vec won't use all of the pre-allocated slots. It would be better to allocate less entries, e.g. half of the rows' length, and let it resize on append if needed.
| rows: &[usize], | ||
| ) { | ||
| let source_views = array.views(); | ||
| let mut pending = Vec::with_capacity(rows.len()); |
There was a problem hiding this comment.
There was a problem hiding this comment.
value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! Most probably many of the rows won't be long (longer than 12 bytes), so the pending Vec won't use all of the pre-allocated slots. It would be better to allocate less entries, e.g. half of the rows' length, and let it resize on append if needed.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
datafusion/physical-plan/src/aggregates/group_values/multi_group_by/bytes_view.rs (1)
247-310: Well-designed batching strategy.The contiguous-range detection and bulk copying is a good optimization. A few notes:
- The flush-before-process pattern (lines 257-261) ensures
buffer_indexcalculation is always correct- Line 264 correctly handles oversized values by using
max(block_size, first_len)- The offset calculation (line 303-304) correctly computes:
dest_start + relative_offset_within_batchConsider adding a brief comment at the top of the method explaining the batching strategy (identify contiguous ranges from same source buffer, copy in bulk) for future maintainers.
,📝 Optional: Add explanatory comment
fn batch_copy_long_views( &mut self, source_buffers: &[Buffer], pending: &[PendingByteViewCopy], ) { + // Batch contiguous ranges from the same source buffer to minimize + // copy operations. For each batch, copy the entire range at once + // and update each view's buffer_index/offset to point to our buffers. let mut batch_start = 0; while batch_start < pending.len() {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@datafusion/physical-plan/src/aggregates/group_values/multi_group_by/bytes_view.rs` around lines 247 - 310, Add a brief explanatory comment at the top of the batch_copy_long_views method describing the batching strategy: that it scans pending (PendingByteViewCopy) to find contiguous ranges from the same source buffer, flushes in_progress when adding the first element would exceed max_block_size, uses max(max_block_size, first_len) to allow oversized single entries, copies contiguous bytes in bulk from source_buffers into in_progress, and then updates views entries with the new buffer_index, adjusted offset (dest_batch_start + relative offset within the batch) so future readers understand why flush_in_progress(), the max_block_size handling, and the offset arithmetic in batch_copy_long_views are necessary.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In
`@datafusion/physical-plan/src/aggregates/group_values/multi_group_by/bytes_view.rs`:
- Around line 247-310: Add a brief explanatory comment at the top of the
batch_copy_long_views method describing the batching strategy: that it scans
pending (PendingByteViewCopy) to find contiguous ranges from the same source
buffer, flushes in_progress when adding the first element would exceed
max_block_size, uses max(max_block_size, first_len) to allow oversized single
entries, copies contiguous bytes in bulk from source_buffers into in_progress,
and then updates views entries with the new buffer_index, adjusted offset
(dest_batch_start + relative offset within the batch) so future readers
understand why flush_in_progress(), the max_block_size handling, and the offset
arithmetic in batch_copy_long_views are necessary.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6348ddf0-4d77-408e-95b0-697edc1e9af2
📒 Files selected for processing (3)
.cursor/rules.mdAGENTS.mddatafusion/physical-plan/src/aggregates/group_values/multi_group_by/bytes_view.rs
21586: To review by AI