Conversation
john-sanchez31
left a comment
There was a problem hiding this comment.
Looks good. Great job Hadia. Couple of comments below related to type hints and the final sql and more suggested tests
| ], | ||
| ) | ||
| # b != 0 (used to mark divisor as processed) | ||
| is_not_zero_expr = CallExpression( |
There was a problem hiding this comment.
It's already in the one under case DivisionByZeroBehavior.NULL:. PyPi doesn't allowed repeated type hints in this case scenario.
| @@ -0,0 +1,6 @@ | |||
| SELECT | |||
| CASE WHEN l_discount = 0 THEN 0 ELSE l_extendedprice / NULLIF(l_discount, 0) END AS computed_value | |||
There was a problem hiding this comment.
Is it necessary the NULLIF(l_discount, 0) part, since we already know at that point that l_discount != 0. Same for other dialects
There was a problem hiding this comment.
We don't know that it's not 0
That requires knowing the actual data stored in the column.
This l_discount actually has 0 which is why I chose it for the test and the result of the tests shows output to indicate that
There was a problem hiding this comment.
I didn't get it. If l_discount is zero then it should do the THEN 0 part right?
If it is still zero then l_extendedprice / NULLIF(l_discount, 0) becomes l_extendedprice / NULL that becomes NULL instead of zero. Am I wrong?
There was a problem hiding this comment.
Yes, If l_discount = 0 it'll go into the THEN 0 so it won't enter into the ELSE so the NULLIF(l_discount, 0) won't enter since it's in the ELSE.
| pytest.param(DivisionByZeroBehavior.ZERO, id="zero"), | ||
| ], | ||
| ) | ||
| def test_division_by_zero_e2e( |
There was a problem hiding this comment.
Do we need type hints for this tests? Not sure, just in case a reminder
There was a problem hiding this comment.
Thanks. Most are clear but added to some anyway.
| def simple_division_by_zero(): | ||
| return lines.CALCULATE(computed_value=extended_price / discount).TOP_K( | ||
| 1, by=computed_value.ASC(na_pos="first") | ||
| ) |
There was a problem hiding this comment.
Can we add more tests? For example
- With multiple operations like
a*(c / b) + z - Operations inside the division like
(a+b)/ (z*2)
There was a problem hiding this comment.
And things with KEEP_IF/IFF already in the denominator:
x / KEEP_IF(y, y < 10)x / KEEP_IF(y, y > 10)x / IFF(y > 0, y, 1)x / IFF(y > 0, 1, y)
pydough/configs/pydough_configs.py
Outdated
| """ | ||
|
|
||
| DATABASE = "DATABASE" | ||
| """Leave it alone and let the database resolve it.""" |
There was a problem hiding this comment.
Let's be more specific than "leave it alone"
| return 5 | ||
|
|
||
|
|
||
| class DivisionByZeroBehavior(Enum): |
There was a problem hiding this comment.
Let's set the values in default_config and defog_config in conftest.py
| if self.strip_zero_guards(divisor) is not divisor: | ||
| return expr, arg_predicates |
There was a problem hiding this comment.
This seems problematic in other situations where there is a KEEP_IF / IFF in the denominator for unrelated reasons. E.g. x / KEEP_IF(y, y < 10)
There was a problem hiding this comment.
Good point. I'll update the code to check only for our initial replacement (zero check)
| arg_predicates: list[PredicateSet] = [ | ||
| self.stack.pop() for _ in range(len(new_call.inputs)) | ||
| ] | ||
| new_call, arg_predicates = self.update_div_expr(new_call, arg_predicates) |
There was a problem hiding this comment.
I think the insertion of the logic for the KEEP_IF / IFF should be in the relational conversion, rather than the simplifier, since the simplifier can be run more than once. Plus, the simplifier can delete the KEEP_IF / IFF calls if they are redundant (e.g. we know the data is going to be non-zero)
There was a problem hiding this comment.
That would be in translate_expression in relational_converter.py (would need to pass in the session to the constructor for the translator so you could access self.session inside translate_expression).
We could move all of this logic to that file as a helper function called at the right time (minus the part that I highlighted as a potential problem). Also, I'd recommend generalizing it slightly so that we can trivially extend this rewrite to apply to MOD as well as DIV, once we implement MOD (e.g. make the rewrite steps use expr.op instead of pydop.DIV)
tests/test_session.py
Outdated
| case "snowflake": | ||
| sf_conn = request.getfixturevalue("sf_conn_db_context") | ||
| return ( | ||
| sf_conn("SNOWFLAKE_SAMPLE_DATA", "TPCH_SF1"), | ||
| get_sf_sample_graph("TPCH"), | ||
| ) |
There was a problem hiding this comment.
Ok, this is extremely clever!!!
tests/test_session.py
Outdated
|
|
||
| @pytest.fixture( | ||
| params=[ | ||
| pytest.param("sqlite", id="sqlite"), |
There was a problem hiding this comment.
Why do all of them have execute except this one? Realistically I don't think any of them need it?
There was a problem hiding this comment.
Good catch. updated it
tests/test_session.py
Outdated
| request, | ||
| get_sample_graph: graph_fetcher, | ||
| get_sf_sample_graph: graph_fetcher, | ||
| ) -> tuple[DatabaseContext, GraphMetadata]: |
There was a problem hiding this comment.
I think we may want to make this a more general fixture in conftest.py that doesn't have to do with division by zero, it is just about getting the TPCH graph/db for each database. With that, we could gradually merge some of our tests for the different dialects that are currently split up across multiple files.
| def simple_division_by_zero(): | ||
| return lines.CALCULATE(computed_value=extended_price / discount).TOP_K( | ||
| 1, by=computed_value.ASC(na_pos="first") | ||
| ) |
There was a problem hiding this comment.
And things with KEEP_IF/IFF already in the denominator:
x / KEEP_IF(y, y < 10)x / KEEP_IF(y, y > 10)x / IFF(y > 0, y, 1)x / IFF(y > 0, 1, y)
| @@ -0,0 +1,6 @@ | |||
| SELECT | |||
| CASE WHEN l_discount = 0 THEN 0 ELSE l_extendedprice / NULLIF(l_discount, 0) END AS computed_value | |||
There was a problem hiding this comment.
I didn't get it. If l_discount is zero then it should do the THEN 0 part right?
If it is still zero then l_extendedprice / NULLIF(l_discount, 0) becomes l_extendedprice / NULL that becomes NULL instead of zero. Am I wrong?
|
|
||
| # DATABASE mode: Snowflake/Postgres throw errors, SQLite/MySQL return NULL | ||
| if division_by_zero_config == DivisionByZeroBehavior.DATABASE: | ||
| if db_context.dialect in (DatabaseDialect.SNOWFLAKE, DatabaseDialect.POSTGRES): |
There was a problem hiding this comment.
I think snowflake default behavior to division by zero is return NULL, not error.
knassre-bodo
left a comment
There was a problem hiding this comment.
A few lingering matters to address, but close to being ready to merge :)
|
|
||
| case DivisionByZeroBehavior.ZERO: | ||
| # If b == 0, return 0 | ||
| # Replace with IFF(b == 0, 0, a / guarded_b) |
There was a problem hiding this comment.
| # Replace with IFF(b == 0, 0, a / guarded_b) | |
| # Replace with IFF(b == 0, 0, a / b) |
| is_rhs_zero = isinstance(rhs, LiteralExpression) and rhs.value == 0 | ||
| return (lhs == value and is_rhs_zero) or (rhs == value and is_lhs_zero) | ||
|
|
||
| def _strip_zero_guards(self, expr: RelationalExpression) -> RelationalExpression: |
There was a problem hiding this comment.
Why not just have this return True/False for if the denominator is already zero-guarded?
| # Only strip if condition is 'then_value != 0' | ||
| condition = expr.inputs[0] | ||
| then_value = expr.inputs[1] | ||
| if self._is_zero_check_condition(condition, then_value): |
There was a problem hiding this comment.
This should only be removed when doing IFF(x != 0, x, 0).
E.g. what if it is IFF(x != 0, x, 3)?
| config.avg_default_zero = False | ||
| config.start_of_week = DayOfWeek.MONDAY | ||
| config.start_week_as_zero = True | ||
| config.division_by_zero = DivisionByZeroBehavior.DATABASE |
There was a problem hiding this comment.
I think for defog, we may actually want it to be NULL based on how a lot of the queries are written. Let's see if this is correct or if it breaks everything. if it is correct, lets go back to a lot of our existing PyDough code for defog queries and remove a lot of the existing division guards/checks (e.g. impl_defog_restaurants_gen13)
| SELECT | ||
| CASE WHEN l_discount = 0 THEN 0 ELSE l_extendedprice / NULLIF(l_discount, 0) END AS computed_value |
There was a problem hiding this comment.
Why is there a NULLIF here as well? It should just be CASE WHEN l_discount = 0 THEN 0 ELSE l_extendedprice / l_discount. This indicates that the denominator is being affected twice?
| # IFF in denominator - divisor in false branch | ||
| pytest.param( | ||
| ( | ||
| DivisionByZeroBehavior.ZERO, |
There was a problem hiding this comment.
Also try these weird variants with the NULL behavior.
| @@ -0,0 +1,6 @@ | |||
| SELECT | |||
| l_quantity * CASE WHEN l_discount = 0 THEN 0 ELSE l_extendedprice / NULLIF(l_discount, 0) END + l_tax AS computed_value | |||
There was a problem hiding this comment.
This does not appear to be how the conversion step is supposed to work?

PyDoughConfigs.division_by_zero.DATABASE: Let database handle it (default)NULL:a / bwill bea / KEEP_IF(b, b != 0)ZERO:a / bwill beIFF(b == 0, 0, a / b)Closes #213