Skip to content

Conversation

@sarutak
Copy link
Member

@sarutak sarutak commented Sep 11, 2025

What changes were proposed in this pull request?

This PR aims to fix sum_distinct.
In the current spark-connect-rust, sum_distinct doesn't work correctly.
If we run the following code,

let df = spark
    .sql("SELECT * FROM VALUES (1), (2), (3), (1), (2), (3) AS data(value)")
    .await?;
df.select([sum_distinct(col("value")).alias("sum")])
    .show(None, None, None)
    .await?;

We will get the following error.

Error: AnalysisException("[UNRESOLVED_ROUTINE] Cannot resolve routine `sum_distinct` on search path [`system`.`builtin`, `system`.`session`, `spark_catalog`.`default`]. SQLSTATE: 42883")

Why are the changes needed?

Bug fix.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added new tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@sarutak
Copy link
Member Author

sarutak commented Sep 30, 2025

@viirya Could you help review this if possible? Thanks.

@sarutak
Copy link
Member Author

sarutak commented Sep 30, 2025

@viirya Thank you for taking a look at this PR!

@sarutak
Copy link
Member Author

sarutak commented Oct 1, 2025

This PR has already approved and the last change is just inserting blank lines.
I'll merge this into master
Thank you for the review, @viirya !

@sarutak sarutak closed this in f749758 Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants