-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
area: core🏗️ Core DataFrame functionality🏗️ Core DataFrame functionalitypriority: low🟢 Low priority / nice-to-have features🟢 Low priority / nice-to-have featurestype: performance⚡ Performance improvements and optimizations⚡ Performance improvements and optimizations
Description
Description
Integrate HAVING clause operations with the existing query optimization engine to enable predicate pushdown and other optimization strategies for improved performance.
Current State
HAVING operations are evaluated after GROUP BY aggregation is complete. The query optimizer doesn't currently analyze HAVING predicates for optimization opportunities.
Proposed Enhancement
1. Predicate Pushdown for HAVING
Identify predicates that can be pushed down to pre-aggregation filtering:
// Original query
df.GroupBy("department").
Having(expr.And(
expr.Sum(expr.Col("salary")).Gt(expr.Lit(100000)),
expr.Col("department").Ne(expr.Lit("TEMP")), // Can be pushed down
))
// Optimized to:
df.Filter(expr.Col("department").Ne(expr.Lit("TEMP"))). // Pushed down
GroupBy("department").
Having(expr.Sum(expr.Col("salary")).Gt(expr.Lit(100000)))2. Early Group Elimination
Skip aggregation for groups that can't possibly satisfy HAVING conditions:
// If we know min(salary) = 30000 across dataset
Having(expr.Sum(expr.Col("salary")).Gt(expr.Lit(1000000)))
// Skip groups with < 34 members (1000000/30000)3. Expression Optimization
Optimize HAVING expressions using existing expression optimization:
// Constant folding
Having(expr.Col("count").Gt(expr.Lit(5).Add(expr.Lit(10))))
// Optimized to:
Having(expr.Col("count").Gt(expr.Lit(15)))
// Common subexpression elimination
Having(expr.And(
expr.Sum(expr.Col("amount")).Gt(expr.Lit(1000)),
expr.Sum(expr.Col("amount")).Lt(expr.Lit(5000)),
))
// Cache Sum(amount) computation4. Join with HAVING Optimization
Optimize joins that include HAVING clauses:
// Push HAVING predicates to appropriate side of join
leftDf.Join(rightDf, ...).
GroupBy("category").
Having(expr.Count(expr.Col("left.id")).Gt(expr.Lit(10)))Implementation Tasks
- Extend QueryOptimizer to analyze HAVING predicates
- Implement predicate classification (pushable vs non-pushable)
- Add HAVING-aware optimization rules to the engine
- Integrate with existing predicate pushdown infrastructure
- Add cost-based optimization for HAVING operations
- Create benchmarks demonstrating optimization benefits
- Update execution plan visualization for HAVING optimizations
Performance Impact
Expected improvements:
- 20-50% reduction in processing time for queries with pushable predicates
- Reduced memory usage by filtering data earlier
- Better cache utilization with optimized expressions
Priority
Low - Performance optimization that builds on existing infrastructure.
Related
- Extends query optimization engine from [High] Implement Query Optimization Engine #3
- Part of HAVING clause milestone Milestone: Implement HAVING Clause Support for SQL Queries #105
- Complements performance optimization from [HAVING] Phase 4.3: Performance optimization and benchmarking #117
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area: core🏗️ Core DataFrame functionality🏗️ Core DataFrame functionalitypriority: low🟢 Low priority / nice-to-have features🟢 Low priority / nice-to-have featurestype: performance⚡ Performance improvements and optimizations⚡ Performance improvements and optimizations