Skip to content

feat(sql): add CTE (WITH clause) support to SQL executor#353

Open
eddiethedean wants to merge 3 commits intomainfrom
feature/add-cte-support
Open

feat(sql): add CTE (WITH clause) support to SQL executor#353
eddiethedean wants to merge 3 commits intomainfrom
feature/add-cte-support

Conversation

@eddiethedean
Copy link
Owner

Summary

Add support for Common Table Expressions (CTEs) in SQL queries, enabling queries like:

WITH filtered_data AS (
    SELECT id, value FROM source_table WHERE value > 75
)
SELECT * FROM filtered_data

Changes

Parser (sparkless/session/sql/parser.py)

  • Added WITH detection in _detect_query_type() (before UNION check, as CTE queries can contain UNION)
  • Added routing for WITH in _parse_components()
  • Implemented _parse_with_query() method that parses CTE definitions using balanced parenthesis counting
  • Handles single CTE, multiple CTEs, and extracts main query

Executor (sparkless/session/sql/executor.py)

  • Added WITH case in execute() method
  • Implemented _execute_with() method that:
    • Executes each CTE query and stores results in temp views
    • Replaces CTE name references in main query with temp view names
    • Executes the main query (which can reference CTEs via temp views)
    • Cleans up temp views after execution

Features Supported

  • ✅ Single CTE: WITH cte AS (...) SELECT * FROM cte
  • ✅ Multiple CTEs: WITH cte1 AS (...), cte2 AS (...) SELECT * FROM cte2
  • ✅ Chained CTEs (CTE referencing another CTE)
  • ✅ CTEs with filtering, aggregation, and column selection
  • ✅ Case-insensitive WITH keyword
  • ✅ Proper cleanup of temp views after execution
  • ✅ Works with ORDER BY, LIMIT, and other SQL clauses in main query

Test Plan

  • 11 comprehensive unit tests covering all CTE scenarios
  • Tests for: simple CTE, multiple CTEs, CTE chains, filtering, aggregation, case insensitivity, whitespace handling, parser detection, ORDER BY, LIMIT
  • All tests pass

Implementation Details

The implementation uses balanced parenthesis counting to correctly parse CTE definitions, even when CTEs contain nested parentheses. Each CTE is executed independently and stored as a temporary view, which is then referenced in the main query. Temp views are automatically cleaned up after query execution.

Related

Fixes #347

This is an independent implementation, not based on PR #320.

Odos Matthews added 3 commits January 23, 2026 20:05
- Add WITH query type detection in parser (before UNION check)
- Implement _parse_with_query() method with balanced parenthesis counting
- Add _execute_with() method to execute CTEs and main query
- Execute each CTE and store results in temp views
- Replace CTE references in main query with temp view names
- Clean up temp views after execution
- Add comprehensive test suite with 11 test cases covering:
  - Simple CTE
  - Multiple CTEs
  - Chained CTEs (CTE referencing another CTE)
  - CTEs with aggregation, filtering, column selection
  - Case-insensitive WITH keyword
  - Whitespace handling
  - ORDER BY and LIMIT in main query
  - Parser detection verification

Fixes #347
… compatibility

- Replace direct SparkSession creation with spark fixture parameter
- Remove manual spark.stop() calls (fixture handles cleanup)
- Skip parser detection test in PySpark mode (parser not exposed)
- All 10 functional tests now pass in both sparkless and PySpark modes
- Edge cases: empty results, all null values, single row, large datasets
- Complex operations: DISTINCT, GROUP BY, HAVING, JOINs (inner/left/right/full outer)
- Advanced features: UNION, UNION ALL, window functions, subqueries, CASE WHEN
- Chained CTEs: 3-level and 4-level chains, multiple CTE references
- SQL integration: CREATE TABLE AS SELECT, INSERT INTO ... SELECT
- String/date/math functions: UPPER, LENGTH, YEAR, MONTH, SQRT, POWER, etc.
- Error handling: non-existent tables, missing main query, circular references
- Complex expressions: nested parentheses, arithmetic precedence, boolean logic
- WHERE clauses: IN, LIKE, BETWEEN, IS NULL, IS NOT NULL, comparison operators
- Array operations, null handling, coalesce, cast operations
- All 60 functional tests pass in PySpark mode
- 3 tests correctly skipped (PySpark-specific limitations: Hive support, parser access)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add CTE (WITH clause) support to SQL executor

1 participant