Skip to content

Configuration for Division By Zero Behavior#470

Open
hadia206 wants to merge 7 commits intomainfrom
Hadia/division_by_zero_config
Open

Configuration for Division By Zero Behavior#470
hadia206 wants to merge 7 commits intomainfrom
Hadia/division_by_zero_config

Conversation

@hadia206
Copy link
Contributor

@hadia206 hadia206 commented Jan 9, 2026

  • Adds configurable division-by-zero behavior via PyDoughConfigs.division_by_zero.
  • 3 options:
    • DATABASE: Let database handle it (default)
    • NULL: a / b will be a / KEEP_IF(b, b != 0)
    • ZERO: a / b will be IFF(b == 0, 0, a / b)
  • Conversions happen before simplification so the simplifier can optimize out guards when possible.

Closes #213

@hadia206 hadia206 marked this pull request as ready for review January 23, 2026 18:19
@hadia206 hadia206 requested review from a team, john-sanchez31, juankx-bodo and knassre-bodo and removed request for a team January 23, 2026 18:19
Copy link
Contributor

@john-sanchez31 john-sanchez31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Great job Hadia. Couple of comments below related to type hints and the final sql and more suggested tests

],
)
# b != 0 (used to mark divisor as processed)
is_not_zero_expr = CallExpression(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check type hints

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already in the one under case DivisionByZeroBehavior.NULL:. PyPi doesn't allowed repeated type hints in this case scenario.

@@ -0,0 +1,6 @@
SELECT
CASE WHEN l_discount = 0 THEN 0 ELSE l_extendedprice / NULLIF(l_discount, 0) END AS computed_value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary the NULLIF(l_discount, 0) part, since we already know at that point that l_discount != 0. Same for other dialects

Copy link
Contributor Author

@hadia206 hadia206 Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know that it's not 0
That requires knowing the actual data stored in the column.

This l_discount actually has 0 which is why I chose it for the test and the result of the tests shows output to indicate that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get it. If l_discount is zero then it should do the THEN 0 part right?
If it is still zero then l_extendedprice / NULLIF(l_discount, 0) becomes l_extendedprice / NULL that becomes NULL instead of zero. Am I wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, If l_discount = 0 it'll go into the THEN 0 so it won't enter into the ELSE so the NULLIF(l_discount, 0) won't enter since it's in the ELSE.

pytest.param(DivisionByZeroBehavior.ZERO, id="zero"),
],
)
def test_division_by_zero_e2e(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need type hints for this tests? Not sure, just in case a reminder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Most are clear but added to some anyway.

def simple_division_by_zero():
return lines.CALCULATE(computed_value=extended_price / discount).TOP_K(
1, by=computed_value.ASC(na_pos="first")
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add more tests? For example

  • With multiple operations like a*(c / b) + z
  • Operations inside the division like (a+b)/ (z*2)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And things with KEEP_IF/IFF already in the denominator:

  • x / KEEP_IF(y, y < 10)
  • x / KEEP_IF(y, y > 10)
  • x / IFF(y > 0, y, 1)
  • x / IFF(y > 0, 1, y)

"""

DATABASE = "DATABASE"
"""Leave it alone and let the database resolve it."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be more specific than "leave it alone"

return 5


class DivisionByZeroBehavior(Enum):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's set the values in default_config and defog_config in conftest.py

Comment on lines 849 to 850
if self.strip_zero_guards(divisor) is not divisor:
return expr, arg_predicates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems problematic in other situations where there is a KEEP_IF / IFF in the denominator for unrelated reasons. E.g. x / KEEP_IF(y, y < 10)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll update the code to check only for our initial replacement (zero check)

arg_predicates: list[PredicateSet] = [
self.stack.pop() for _ in range(len(new_call.inputs))
]
new_call, arg_predicates = self.update_div_expr(new_call, arg_predicates)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the insertion of the logic for the KEEP_IF / IFF should be in the relational conversion, rather than the simplifier, since the simplifier can be run more than once. Plus, the simplifier can delete the KEEP_IF / IFF calls if they are redundant (e.g. we know the data is going to be non-zero)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be in translate_expression in relational_converter.py (would need to pass in the session to the constructor for the translator so you could access self.session inside translate_expression).

We could move all of this logic to that file as a helper function called at the right time (minus the part that I highlighted as a potential problem). Also, I'd recommend generalizing it slightly so that we can trivially extend this rewrite to apply to MOD as well as DIV, once we implement MOD (e.g. make the rewrite steps use expr.op instead of pydop.DIV)

Comment on lines 266 to 271
case "snowflake":
sf_conn = request.getfixturevalue("sf_conn_db_context")
return (
sf_conn("SNOWFLAKE_SAMPLE_DATA", "TPCH_SF1"),
get_sf_sample_graph("TPCH"),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this is extremely clever!!!


@pytest.fixture(
params=[
pytest.param("sqlite", id="sqlite"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do all of them have execute except this one? Realistically I don't think any of them need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. updated it

request,
get_sample_graph: graph_fetcher,
get_sf_sample_graph: graph_fetcher,
) -> tuple[DatabaseContext, GraphMetadata]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to make this a more general fixture in conftest.py that doesn't have to do with division by zero, it is just about getting the TPCH graph/db for each database. With that, we could gradually merge some of our tests for the different dialects that are currently split up across multiple files.

def simple_division_by_zero():
return lines.CALCULATE(computed_value=extended_price / discount).TOP_K(
1, by=computed_value.ASC(na_pos="first")
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And things with KEEP_IF/IFF already in the denominator:

  • x / KEEP_IF(y, y < 10)
  • x / KEEP_IF(y, y > 10)
  • x / IFF(y > 0, y, 1)
  • x / IFF(y > 0, 1, y)

Copy link
Contributor

@john-sanchez31 john-sanchez31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -0,0 +1,6 @@
SELECT
CASE WHEN l_discount = 0 THEN 0 ELSE l_extendedprice / NULLIF(l_discount, 0) END AS computed_value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get it. If l_discount is zero then it should do the THEN 0 part right?
If it is still zero then l_extendedprice / NULLIF(l_discount, 0) becomes l_extendedprice / NULL that becomes NULL instead of zero. Am I wrong?


# DATABASE mode: Snowflake/Postgres throw errors, SQLite/MySQL return NULL
if division_by_zero_config == DivisionByZeroBehavior.DATABASE:
if db_context.dialect in (DatabaseDialect.SNOWFLAKE, DatabaseDialect.POSTGRES):
Copy link
Contributor

@juankx-bodo juankx-bodo Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think snowflake default behavior to division by zero is return NULL, not error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, default behavior for Snowflake is error. Please see attached screenshot example. Plus, the test won't pass if it's not giving an error :-)

Screenshot 2026-02-03 at 10 09 11 AM

Copy link
Contributor

@knassre-bodo knassre-bodo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few lingering matters to address, but close to being ready to merge :)


case DivisionByZeroBehavior.ZERO:
# If b == 0, return 0
# Replace with IFF(b == 0, 0, a / guarded_b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Replace with IFF(b == 0, 0, a / guarded_b)
# Replace with IFF(b == 0, 0, a / b)

is_rhs_zero = isinstance(rhs, LiteralExpression) and rhs.value == 0
return (lhs == value and is_rhs_zero) or (rhs == value and is_lhs_zero)

def _strip_zero_guards(self, expr: RelationalExpression) -> RelationalExpression:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just have this return True/False for if the denominator is already zero-guarded?

# Only strip if condition is 'then_value != 0'
condition = expr.inputs[0]
then_value = expr.inputs[1]
if self._is_zero_check_condition(condition, then_value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should only be removed when doing IFF(x != 0, x, 0).

E.g. what if it is IFF(x != 0, x, 3)?

config.avg_default_zero = False
config.start_of_week = DayOfWeek.MONDAY
config.start_week_as_zero = True
config.division_by_zero = DivisionByZeroBehavior.DATABASE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for defog, we may actually want it to be NULL based on how a lot of the queries are written. Let's see if this is correct or if it breaks everything. if it is correct, lets go back to a lot of our existing PyDough code for defog queries and remove a lot of the existing division guards/checks (e.g. impl_defog_restaurants_gen13)

Comment on lines +1 to +2
SELECT
CASE WHEN l_discount = 0 THEN 0 ELSE l_extendedprice / NULLIF(l_discount, 0) END AS computed_value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a NULLIF here as well? It should just be CASE WHEN l_discount = 0 THEN 0 ELSE l_extendedprice / l_discount. This indicates that the denominator is being affected twice?

# IFF in denominator - divisor in false branch
pytest.param(
(
DivisionByZeroBehavior.ZERO,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also try these weird variants with the NULL behavior.

@@ -0,0 +1,6 @@
SELECT
l_quantity * CASE WHEN l_discount = 0 THEN 0 ELSE l_extendedprice / NULLIF(l_discount, 0) END + l_tax AS computed_value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not appear to be how the conversion step is supposed to work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make division-by-zero behavior configurable

4 participants