Skip to content

18654: Rename boolean Interval constants to match NullableInterval#22

Open
martin-augment wants to merge 1 commit intomainfrom
pr-18654-2025-11-13-13-37-13
Open

18654: Rename boolean Interval constants to match NullableInterval#22
martin-augment wants to merge 1 commit intomainfrom
pr-18654-2025-11-13-13-37-13

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

18654: To review by AI

CERTAINLY_FALSE -> FALSE
CERTAINLY_TRUE -> TRUE
UNCERTAIN -> TRUE_OR_FALSE
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Nov 13, 2025

Walkthrough

This pull request refactors boolean interval arithmetic constants across the DataFusion codebase. In interval_arithmetic.rs, it renames enum variants CERTAINLY_FALSE to FALSE, CERTAINLY_TRUE to TRUE, and UNCERTAIN to TRUE_OR_FALSE. It introduces additional boolean interval constants TRUE_OR_UNKNOWN, FALSE_OR_UNKNOWN, and ANY_TRUTH_VALUE. Deprecated aliases are added for backward compatibility. Corresponding constants are added to NullableInterval. All usages of renamed variants are updated throughout the codebase in statistics, physical expressions, interval solvers, and join operations. Tests are updated to reflect the new constant names.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-18654-2025-11-13-13-37-13

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude bot commented Nov 13, 2025

Pull Request Review: Rename boolean Interval constants to match NullableInterval

Summary

This PR successfully renames boolean interval constants to use clearer, more semantic names that align with the NullableInterval naming conventions. The changes improve code readability by replacing probabilistic language (CERTAINLY_TRUE, UNCERTAIN) with direct truth value descriptions (TRUE, TRUE_OR_FALSE).


Code Quality & Best Practices ✅

Strengths:

  1. Excellent backwards compatibility: The deprecated constants are properly aliased using const assignments, ensuring zero runtime overhead while maintaining API compatibility

    • CERTAINLY_FALSEFALSE (line 440 in interval_arithmetic.rs)
    • UNCERTAINTRUE_OR_FALSE (line 449)
    • CERTAINLY_TRUETRUE (line 458)
  2. Clear deprecation notices: All deprecated constants include helpful deprecation messages directing users to the new names

  3. Comprehensive refactoring: All usages throughout the codebase have been updated consistently across 11 files

  4. Improved documentation: New constants include clear doc comments explaining their semantics (e.g., "An interval containing only the 'false' truth value")

  5. Semantic alignment: The new NullableInterval::ANY_TRUTH_VALUE constant better expresses its meaning (all three truth values: true, false, unknown) compared to the previous UNCERTAIN

Code Style:

  • Follows Rust conventions for deprecation attributes
  • Consistent naming patterns across both Interval and NullableInterval types
  • Clean, readable test updates that maintain comprehensive coverage

Test Coverage ✅

Excellent test maintenance:

  1. All existing tests updated: Tests for and, or, not, comparison operations, and nullable intervals have been properly updated with new constant names

  2. Test clarity improved: The new names make test cases more readable:

    (Interval::TRUE, Interval::FALSE, Interval::FALSE)  // Much clearer than
    (Interval::CERTAINLY_TRUE, Interval::CERTAINLY_FALSE, Interval::CERTAINLY_FALSE)
  3. Coverage maintained: No test cases were removed; all existing test scenarios continue to validate the interval arithmetic logic

  4. Comprehensive test suite: Tests cover:

    • Boolean operations (AND, OR, NOT)
    • Comparison operations (GT, GTE, LT, LTE, EQ)
    • Nullable interval operations
    • Edge cases with null boolean bounds

Potential Issues & Considerations

Minor Observations:

  1. Deprecation version (line 439, 448, 457): Uses version "52.0.0"

    • ✅ Appears correct if this is for DataFusion 52.0.0 release
    • Consider: Will deprecated constants be removed in a future version? If so, documenting the removal timeline would help users plan migrations
  2. Net line reduction: -468 lines deleted vs +339 added = -129 net lines

    • ✅ This is excellent - the refactoring simplifies the codebase while maintaining functionality
  3. No breaking changes for users: Since old constants are deprecated but still work, this is a non-breaking change

    • ✅ Good API evolution practice

Performance Considerations ✅

  1. Zero runtime overhead: Using const aliases means no performance impact whatsoever
  2. Compile-time substitution: All constant references are resolved at compile time
  3. No new allocations: Pure refactoring with identical runtime behavior

Security Concerns ✅

No security concerns identified. This is a pure refactoring that:

  • Does not change any logic or algorithms
  • Does not introduce new attack vectors
  • Does not modify data validation or bounds checking behavior

Specific File Reviews

datafusion/expr-common/src/interval_arithmetic.rs (+253, -688)

Main changes: Constant definitions, usage updates, and test updates

  • ✅ Well-structured constant definitions with clear documentation
  • ✅ Comprehensive test updates maintain coverage
  • ✅ Proper use of deprecation attributes

datafusion/physical-expr/src/expressions/binary.rs (+22, -34)

Main changes: Updated interval constant usage

  • ✅ Consistent refactoring of all interval comparisons
  • ✅ No logic changes, only naming updates

Other files:

All other changes follow the same pattern - consistent, mechanical refactoring with no logic changes.


Recommendations

Optional Enhancements (not blocking):

  1. Consider adding a migration guide: While the deprecation warnings are helpful, a comment in the PR description or commit message about when deprecated constants might be removed would help downstream users

  2. Rustdoc example update: Consider adding a code example in the rustdoc comments showing the new constant usage to help users discover the preferred API

Example:

/// An interval containing only the 'true' truth value.
/// 
/// # Example
/// ```
/// use datafusion_expr::interval_arithmetic::Interval;
/// let true_interval = Interval::TRUE;
/// ```
pub const TRUE: Self = ...

Conclusion

Recommendation: APPROVE ✅

This is an excellent refactoring PR that:

  • ✅ Improves code clarity and maintainability
  • ✅ Maintains full backwards compatibility
  • ✅ Has comprehensive test coverage
  • ✅ Follows Rust best practices for API evolution
  • ✅ Has zero performance impact
  • ✅ Contains no security concerns
  • ✅ Reduces overall code complexity (-129 lines)

The changes are well-executed, thoroughly tested, and ready to merge. Great work on maintaining API compatibility while improving the codebase!

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestion posted.

Comment augment review to trigger a new review at any time.

.unwrap();

assert_eq!(null_interval, Interval::UNCERTAIN);
assert_eq!(null_interval, Interval::TRUE_OR_FALSE);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few lines below, the comment still says Interval::UNCERTAIN; consider updating it to Interval::TRUE_OR_FALSE to match the new naming (the code already uses TRUE_OR_FALSE).

🤖 Was this useful? React with 👍 or 👎

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback:The Augment AI reviewer is correct that one occurrence of Interval::UNCERTAIN was left unchanged in the docstrings. It does not cause any problems now but it would be confusing once the deprecated constants are removed in a later version

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
datafusion/physical-expr/src/intervals/cp_solver.rs (1)

647-654: OR with parent TRUE can surface as user-facing error — reachability confirmed, fallback recommended

This code path IS reachable. The update_ranges method is called with Interval::TRUE in production code:

  • symmetric_hash_join.rs:1243 in join filter evaluation
  • analysis.rs:217 in expression analysis

If a filter expression contains an OR operator, the solver's top-down propagation will hit this guard and emit not_impl_err!. The underlying BinaryExpr::propagate_constraints can handle TRUE+OR (verified in test), so a fallback like Ok(Some(vec![])) (treat as no refinement) is feasible until multi-set propagation is implemented.

Consider either:

  1. Switching to a non-fatal fallback to avoid blocking real queries
  2. Adding a test case that exercises this from update_ranges (not just direct propagate_constraints calls)
🧹 Nitpick comments (3)
datafusion/functions/src/math/monotonicity.rs (1)

34-38: Monotonicity domain checks consistently migrated

All in-domain validations now compare to Interval::TRUE, matching the new boolean-interval API. No semantic drift detected.

If there’s a helper like is_true(), consider it for brevity.

Also applies to: 75-80, 113-118, 210-215, 374-381, 501-506, 539-544, 577-582, 704-709

datafusion/physical-expr/src/expressions/binary.rs (1)

371-403: OR propagation aligns with rename; verify infeasibility/precision and add tests for new variants

Parent FALSE case properly forces both children to FALSE and returns None when infeasible. For parent TRUE, targeted refinements when the other side is TRUE_OR_FALSE are good. As with AND, consider handling TRUE_OR_UNKNOWN/FALSE_OR_UNKNOWN/ANY_TRUTH_VALUE for better precision and add tests mirroring these cases.

I can add unit tests that cover the new boolean variants once we confirm their presence/semantics.

datafusion/physical-expr/src/intervals/cp_solver.rs (1)

1641-1677: Tests updated to the new constants — add cases for extended variants

Good coverage of TRUE/FALSE/TRUE_OR_FALSE in OR/AND propagation. Consider adding tests for TRUE_OR_UNKNOWN, FALSE_OR_UNKNOWN, and ANY_TRUTH_VALUE if these are now part of Interval to keep parity with NullableInterval.

I can submit the additional test vectors once we confirm the constructors/constants.

Also applies to: 1689-1713

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 47a826f and 116dd7c.

📒 Files selected for processing (11)
  • datafusion/expr-common/src/interval_arithmetic.rs (31 hunks)
  • datafusion/expr-common/src/statistics.rs (5 hunks)
  • datafusion/functions/src/math/abs.rs (1 hunks)
  • datafusion/functions/src/math/monotonicity.rs (9 hunks)
  • datafusion/physical-expr-common/src/physical_expr.rs (2 hunks)
  • datafusion/physical-expr/src/analysis.rs (1 hunks)
  • datafusion/physical-expr/src/expressions/binary.rs (2 hunks)
  • datafusion/physical-expr/src/expressions/not.rs (1 hunks)
  • datafusion/physical-expr/src/intervals/cp_solver.rs (10 hunks)
  • datafusion/physical-expr/src/statistics/stats_solver.rs (1 hunks)
  • datafusion/physical-plan/src/joins/symmetric_hash_join.rs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
datafusion/physical-plan/src/joins/symmetric_hash_join.rs (1)
datafusion/physical-expr/src/intervals/cp_solver.rs (1)
  • graph (647-650)
datafusion/physical-expr-common/src/physical_expr.rs (1)
datafusion/common/src/scalar/mod.rs (1)
  • new_one (1567-1642)
datafusion/physical-expr/src/intervals/cp_solver.rs (1)
datafusion/physical-expr/src/expressions/binary.rs (10)
  • left (105-107)
  • left (848-848)
  • left (852-852)
  • right (110-112)
  • right (849-849)
  • right (853-853)
  • children (304-306)
  • expr (132-132)
  • expr (503-503)
  • propagate_constraints (326-414)
datafusion/functions/src/math/abs.rs (1)
datafusion/expr-common/src/statistics.rs (6)
  • range (167-175)
  • range (328-330)
  • range (402-409)
  • range (442-444)
  • range (503-514)
  • range (571-573)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Cursor Bugbot
  • GitHub Check: claude-review
  • GitHub Check: Analyze (rust)
🔇 Additional comments (11)
datafusion/physical-plan/src/joins/symmetric_hash_join.rs (1)

1243-1244: Rename aligns with pruning semantics

Using Interval::TRUE for graph.update_ranges is correct and consistent with the new boolean interval naming. No behavior change beyond the rename.

datafusion/functions/src/math/abs.rs (1)

178-184: Ordering check correctly updated

Comparing to Interval::TRUE preserves the original intent for abs monotonicity. Looks good.

datafusion/physical-expr/src/analysis.rs (1)

217-231: Propagation gate uses new truth constant

update_ranges(..., Interval::TRUE) is consistent with the refactor; success/infeasible/cannot-propagate handling unchanged.

datafusion/physical-expr/src/statistics/stats_solver.rs (1)

156-166: Subset check correctly migrated

Using Interval::TRUE to confirm full containment before adopting given_stats maintains prior behavior with the new enum.

datafusion/physical-expr/src/expressions/not.rs (1)

156-176: NOT propagation updated to TRUE/FALSE

Branching against Interval::TRUE and Interval::FALSE is correct with the new naming; behavior preserved.

datafusion/physical-expr-common/src/physical_expr.rs (1)

248-258: Boolean interval → Bernoulli mapping remains correct

TRUE maps to p=1, FALSE to p=0, otherwise null probability. Propagation path retains the “must be certain” assumption.

Also applies to: 310-320

datafusion/expr-common/src/statistics.rs (2)

162-165: Bernoulli semantics and tests updated coherently

Range and comparison handling now use FALSE/TRUE/TRUE_OR_FALSE; unit tests reflect this. All consistent.

Also applies to: 503-514, 721-729, 882-883, 995-996


1-20: Verification confirms proper constant migration—old names are deprecated, not lingering

The old constants (CERTAINLY_TRUE, CERTAINLY_FALSE, UNCERTAIN) are properly handled as deprecated aliases marked with #[deprecated(since = "52.0.0")] that redirect to the new names. References in comments are documentation only. The new constants (Interval::TRUE, Interval::FALSE, Interval::TRUE_OR_FALSE) are integrated throughout the codebase across multiple modules. No lingering issues found.

datafusion/physical-expr/src/intervals/cp_solver.rs (2)

521-533: Bounds containment checks updated to boolean-interval API; semantics preserved

given_range.contains(bounds)? == Interval::TRUE and bounds.contains(given_range)? != Interval::FALSE maintain the previous tri-state logic. No action needed.


348-383: All changes verified: constant rename applied correctly, no stale names in active code

Verification confirms no occurrences of old constant names (CERTAINLY_TRUE, CERTAINLY_FALSE, UNCERTAIN) in active code outside interval_arithmetic.rs—only as backward-compatibility aliases in definitions. The cp_solver.rs propagate_comparison function correctly uses the new names (Interval::TRUE, Interval::FALSE, Interval::TRUE_OR_FALSE) and handles all branches appropriately, with uncertain parents returning Ok(None) as expected. The logic preserves prior behavior.

datafusion/physical-expr/src/expressions/binary.rs (1)

336-368: Review comment is incorrect due to type mismatch; Interval and NullableInterval are separate types

The code in binary.rs:336–368 uses the Interval type, which defines only three boolean constants: TRUE, FALSE, and TRUE_OR_FALSE. The suggested constants TRUE_OR_UNKNOWN, FALSE_OR_UNKNOWN, and ANY_TRUTH_VALUE exist only on the NullableInterval type and are not available on Interval. Since the file does not import or use NullableInterval, the suggestion to add support for these constants is not applicable to this code.

Likely an incorrect or invalid review comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants