Optimize SEMI joins to INNER joins when HAS is used on singular relationships by hadia206 · Pull Request #480 · bodo-ai/PyDough

hadia206 · 2026-01-28T18:33:27Z

Optimizes using INNER JOIN instead of SEMI JOIN when HAS is used on a singular sub-collection, the existence check is redundant since singular relationships always have exactly one match.
Qualification Bug Fix for issue where queries failed when the graph name matched a collection name

closes #476

…ther tests

knassre-bodo

Hi Hadia, mostly LGTM just a few testing thoughts

knassre-bodo · 2026-01-29T07:07:55Z

tests/test_pipeline_tpch_custom.py

        ),
+        pytest.param(
+            PyDoughPandasTest(
+                "result = TPCH.CALCULATE(n=COUNT(customers.WHERE(HAS(nation.WHERE(region.name == 'ASIA')))))",


NIT: if need be for visual clarity, you can split this & the others up into multiple lines (put everything inside the HAS inside of a variable)

knassre-bodo · 2026-01-29T07:09:05Z

tests/test_pipeline_tpch_custom.py

+        # HAS on singular relationship with additional filter
+        pytest.param(
+            PyDoughPandasTest(
+                "result = TPCH.CALCULATE(n=COUNT(suppliers.WHERE(HAS(nation.WHERE(region.name == 'EUROPE')))))",


How is this different from the basic redundant_has?

result = TPCH.CALCULATE(n=COUNT(customers.WHERE(HAS(nation.WHERE(region.name == 'ASIA')))))

knassre-bodo · 2026-01-29T07:10:03Z

tests/test_pipeline_tpch_custom.py

+            ),
+            id="redundant_has_on_plural_lineitems",
+        ),
+        # HASNOT on singular relationship - should optimize to ANTI join or similar


No optimization here, it just stays as ANTI the entire time

knassre-bodo · 2026-01-29T07:12:01Z

tests/test_pipeline_tpch_custom.py

+                    }
+                ),
+                "redundant_has_not_on_singular",
+                skip_relational=True,


I think all of these we should keep either the relational or sql just so we can see what is going on.

knassre-bodo · 2026-01-29T07:12:33Z

tests/test_pipeline_tpch_custom.py

+            ),
+            id="redundant_has_not_on_singular",
+        ),
+        # HAS without WHERE filter on singular - should optimize to INNER


This isn't really an optimization either; HAS outside of the conjunction of a WHERE always just becomes COUNT(x) != 0, which other rewrites interpret as an INNER join. The optimizations are that other passes will delete the aggregation, and should realize that since every customer has a nation + we aren't using anything from the nation, the nation join can get deleted so we just do SELECT COUNT(*) FROM CUSTOMERS)

knassre-bodo · 2026-01-29T07:13:18Z

tests/test_pipeline_tpch_custom.py

+        # HAS on singular within plural context - orders whose customer is from ASIA
+        pytest.param(
+            PyDoughPandasTest(
+                "result = TPCH.CALCULATE(n=COUNT(orders.WHERE(HAS(customer.WHERE(nation.region.name == 'ASIA')))))",


This is the same as filtering customers based on a filter involving region/nation, so it is not a distinct case from the original test.

knassre-bodo · 2026-01-29T07:20:32Z

tests/test_pipeline_tpch_custom.py

+                skip_sql=True,
+            ),
+            id="redundant_has_singular_in_plural_context",
+        ),


Some weird cases to consider with HAS:

Contents of the HAS is a CROSS that has a correlated filter back to the context outside the HAS

Same as (1), but ensure that the child data is singular with regards to the parent based on the filter then do .SINGULAR() on the child (theoretically can get rewritten but I don't think it will be with the current version, that can be a later followup to deal with more advanced cases of determining when the child subtree is singular with regards to the parent)

john-sanchez31

LGTM!

…i/PyDough into Hadia/remove_has_singular

hadia206 added 3 commits January 27, 2026 12:51

switch redundant has and add tests

7c0ab19

add test files

f074118

[run CI][run SF][run mysql][run postgres] BIRD menu_5556 and update o…

6c06d74

…ther tests

hadia206 marked this pull request as ready for review January 28, 2026 21:03

hadia206 requested review from a team, john-sanchez31, juankx-bodo and knassre-bodo and removed request for a team January 28, 2026 21:04

knassre-bodo approved these changes Jan 29, 2026

View reviewed changes

john-sanchez31 approved these changes Jan 29, 2026

View reviewed changes

hadia206 added 5 commits February 6, 2026 13:59

[run all] address comments

0d77988

Merge branch 'main' into Hadia/remove_has_singular

1abba6a

[run CI]

5e52c3c

Merge branch 'Hadia/remove_has_singular' of https://github.com/bodo-a…

bbc919c

…i/PyDough into Hadia/remove_has_singular

[run CI]

3f74eea

hadia206 merged commit ba5294c into main Feb 6, 2026
12 checks passed

hadia206 deleted the Hadia/remove_has_singular branch February 6, 2026 23:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize SEMI joins to INNER joins when HAS is used on singular relationships#480

Optimize SEMI joins to INNER joins when HAS is used on singular relationships#480
hadia206 merged 8 commits intomainfrom
Hadia/remove_has_singular

hadia206 commented Jan 28, 2026

Uh oh!

knassre-bodo left a comment

Uh oh!

knassre-bodo Jan 29, 2026

Uh oh!

knassre-bodo Jan 29, 2026

Uh oh!

knassre-bodo Jan 29, 2026

Uh oh!

knassre-bodo Jan 29, 2026

Uh oh!

knassre-bodo Jan 29, 2026

Uh oh!

knassre-bodo Jan 29, 2026

Uh oh!

knassre-bodo Jan 29, 2026

Uh oh!

john-sanchez31 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hadia206 commented Jan 28, 2026

Uh oh!

knassre-bodo left a comment

Choose a reason for hiding this comment

Uh oh!

knassre-bodo Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

knassre-bodo Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

knassre-bodo Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

knassre-bodo Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

knassre-bodo Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

knassre-bodo Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

knassre-bodo Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

john-sanchez31 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants