Skip to content

feat: add Delta Lake Time Travel support to SQL parser#321

Open
DidelotK wants to merge 5 commits intoeddiethedean:mainfrom
DidelotK:feat/add-time-travel-support
Open

feat: add Delta Lake Time Travel support to SQL parser#321
DidelotK wants to merge 5 commits intoeddiethedean:mainfrom
DidelotK:feat/add-time-travel-support

Conversation

@DidelotK
Copy link
Contributor

Summary

Add support for Delta Lake Time Travel syntax in SQL queries, enabling queries that reference historical versions of tables. In mock mode, time travel queries return current data (no real versioning), allowing code that uses time travel to run without errors during testing.

Supported Syntax

-- Standard Delta Lake syntax
SELECT * FROM my_table VERSION AS OF 5
SELECT * FROM my_table TIMESTAMP AS OF '2024-01-01 00:00:00'

-- Databricks shorthand syntax
SELECT * FROM my_table@v5
SELECT * FROM my_table@20240101
SELECT * FROM my_table@20240615123000

Changes

  • Parser (parser.py):
    • Added _extract_time_travel() helper method to parse time travel syntax
    • Added pre-processing in _parse_select_query() to extract time travel info before main parsing
    • Time travel info stored in components["time_travel"] with:
      • type: "version", "timestamp", or "timestamp_compact"
      • value: The version number or timestamp string
      • table: The base table name

Implementation Details

The implementation works by:

  1. Pre-processing the query to detect and extract time travel syntax
  2. Stripping the time travel clause from the query (keeping just the table name)
  3. Storing time travel metadata in components for reference
  4. Executing the query normally (mock behavior: returns current data)

This approach allows:

  • Code using time travel to run without errors in test environments
  • Access to time travel metadata if needed for logging/debugging
  • Zero changes to the executor (mock returns current data)

Test Plan

  • 16 new unit tests covering all time travel scenarios
  • All 78 session tests pass (including new tests)
  • Tested: VERSION AS OF, TIMESTAMP AS OF, @v shorthand, @yyyymmdd shorthand
  • Tested: Combination with WHERE, GROUP BY, ORDER BY, column selection
  • Tested: Case insensitivity

Known Limitations

  • Mock mode returns current data (no actual historical data)
  • This is expected behavior for a mock framework

🤖 Generated with Claude Code

Add support for Time Travel syntax in SQL queries, enabling queries that
reference historical versions of tables:
  - VERSION AS OF <number>
  - TIMESTAMP AS OF '<timestamp>'
  - table@v<version> (Databricks shorthand)
  - table@<yyyyMMdd> (Databricks compact timestamp shorthand)

Implementation:
- Parser pre-processes queries to extract time travel syntax
- Time travel info stored in components["time_travel"]
- Query is processed with time travel stripped (mock returns current data)
- Supports case-insensitive keywords

In mock mode, time travel queries return current data (no real versioning).
This allows code that uses time travel to run without errors while testing.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@DidelotK DidelotK marked this pull request as draft January 22, 2026 21:53
DidelotK and others added 4 commits January 22, 2026 22:55
- Remove unused imports (cast, pytest, DataFrame) in test_sql_time_travel.py

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Replace `dict[str, Any] | None` with `Optional[Dict[str, Any]]`
- Add missing typing imports (Optional, Tuple)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@DidelotK DidelotK marked this pull request as ready for review January 22, 2026 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant