Skip to content

[pull] main from apache:main#106

Merged
pull[bot] merged 1 commit intoburaksenn:mainfrom
apache:main
Apr 15, 2026
Merged

[pull] main from apache:main#106
pull[bot] merged 1 commit intoburaksenn:mainfrom
apache:main

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Apr 15, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

## Which issue does this PR close?

- Part of #20766

## Rationale for this change

After a filter reduces a table from 100 to 10 rows, or a LIMIT 10 caps
the output, the NDV (e.g. 80) should not exceed the new row count.
Without capping, join cardinality estimation uses an inflated
denominator, leading to inaccurate estimates.

## What changes are included in this PR?

Cap `distinct_count` at `num_rows` in three places to prevent NDV from
exceeding the actual row count:
- `max_distinct_count` in join cardinality estimation (`joins/utils.rs`)
- `collect_new_statistics` in filter output statistics (`filter.rs`)
- `Statistics::with_fetch` (`stats.rs`), which covers `GlobalLimitExec`,
`LocalLimitExec`, `SortExec` (with fetch), `CoalescePartitionsExec`
(with fetch), and `CoalesceBatchesExec` (with fetch)

Note: NDV capping for `AggregateExec` is covered separately in #20926.

## Are these changes tested?

- `test_filter_statistics_ndv_capped_at_row_count` - verifies NDV capped
at filtered row count
- 2 new join cardinality test cases - NDV > rows on both/one side
- Updated `test_join_cardinality` expected values for capped NDV
- `test_with_fetch_caps_ndv_at_row_count` - verifies NDV capped after
LIMIT
- `test_with_fetch_ndv_below_row_count_unchanged` - verifies NDV
untouched when already below row count
- All existing `with_fetch` tests pass

## Are there any user-facing changes?

No public API changes. Only internal statistics estimation is affected.

Disclaimer: I used AI to assist in the code generation, I have manually
reviewed the output and it matches my intention and understanding.
@pull pull bot locked and limited conversation to collaborators Apr 15, 2026
@pull pull bot added the ⤵️ pull label Apr 15, 2026
@pull pull bot merged commit 244f891 into buraksenn:main Apr 15, 2026
15 of 16 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant