Skip to content

Commit b8a2c1a

Browse files
authored
[parquet] Avoid a clone while resolving the read strategy (#9056)
# Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. --> - related to apache/datafusion#19477 # Rationale for this change While working on apache/datafusion#19477, and profiling ClickBench q7, I noticed that the RowSelectors was being cloned to resolve the strategy -- for a large number of selections this is expensive and shows up in the traces <img width="1724" height="1074" alt="Screenshot 2025-12-28 at 4 49 49 PM" src="https://github.com/user-attachments/assets/72c6fd22-9377-48ef-ba80-6bc03b177cf7" /> ```shell samply record -- ./datafusion-cli-alamb_enable_pushdown -f q.sql > /dev/null 2>& ``` We should change the code to avoid cloning the RowSelectors when resolving the strategy. # Changes Don't clone / allocate while resolving the strategy. I don't expect this to have a massive impact, but it did show up in the profile FYI @hhhizzz -- perhaps you could review this PR # Are these changes tested? Yes by CI # Are there any user-facing changes? small performance improvement
1 parent 1ba902e commit b8a2c1a

File tree

1 file changed

+13
-10
lines changed

1 file changed

+13
-10
lines changed

parquet/src/arrow/arrow_reader/read_plan.rs

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -110,19 +110,22 @@ impl ReadPlanBuilder {
110110
None => return RowSelectionStrategy::Selectors,
111111
};
112112

113-
let trimmed = selection.clone().trim();
114-
let selectors: Vec<RowSelector> = trimmed.into();
115-
if selectors.is_empty() {
116-
return RowSelectionStrategy::Mask;
117-
}
118-
119-
let total_rows: usize = selectors.iter().map(|s| s.row_count).sum();
120-
let selector_count = selectors.len();
121-
if selector_count == 0 {
113+
// total_rows: total number of rows selected / skipped
114+
// effective_count: number of non-empty selectors
115+
let (total_rows, effective_count) =
116+
selection.iter().fold((0usize, 0usize), |(rows, count), s| {
117+
if s.row_count > 0 {
118+
(rows + s.row_count, count + 1)
119+
} else {
120+
(rows, count)
121+
}
122+
});
123+
124+
if effective_count == 0 {
122125
return RowSelectionStrategy::Mask;
123126
}
124127

125-
if total_rows < selector_count.saturating_mul(threshold) {
128+
if total_rows < effective_count.saturating_mul(threshold) {
126129
RowSelectionStrategy::Mask
127130
} else {
128131
RowSelectionStrategy::Selectors

0 commit comments

Comments
 (0)