Skip to content

Conversation

@murali-db
Copy link
Owner

@murali-db murali-db commented Nov 2, 2025

Summary

This PR implements the Spark DataSource V2 Table that uses server-side scan planning to read data files. This is the core read path implementation that will be used by DeltaCatalog when credentials are unavailable.

Stack:

Changes

DSv2 Table Implementation (spark/.../catalog/ServerSidePlannedTable.scala)

Class Hierarchy:
ServerSidePlannedTable (Table, SupportsRead)
└─> ServerSidePlannedScanBuilder (ScanBuilder)
└─> ServerSidePlannedScan (Scan, Batch)
├─> ServerSidePlannedFileInputPartition (InputPartition)
└─> ServerSidePlannedFilePartitionReaderFactory (PartitionReaderFactory)
└─> ServerSidePlannedFilePartitionReader (PartitionReader[InternalRow])

Key Classes:

  • ServerSidePlannedTable: Main DSv2 Table with BATCH_READ capability
  • ServerSidePlannedScan: Calls client.planScan() to get file list from server
  • ServerSidePlannedFilePartitionReaderFactory: Pre-builds file format readers on driver
  • ServerSidePlannedFilePartitionReader: Executes pre-built readers on executors

Tests (spark/src/test/.../ServerSidePlannedTableSuite.scala)

Unit Tests (with MockServerSidePlanningClient):

  • Table properties verification
  • Single file scanning
  • Multiple file scanning
  • Error handling from client

Integration Tests (with ServerSidePlanningTestClient):

  • File discovery using input_file_name()
  • End-to-end SELECT queries through DSv2 path
  • Verification of data correctness
  • Factory registry functionality

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

How was this patch tested?

Does this PR introduce any user-facing changes?

murali-db added a commit that referenced this pull request Nov 2, 2025
This PR adds comprehensive integration tests that validate the entire
server-side planning stack from DeltaCatalog through to data reading.

Test Coverage:
- Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data
- SELECT query execution through server-side planning path
- Aggregation queries (SUM, COUNT, GROUP BY)
- Verification that normal path is unaffected when feature disabled

Test Strategy:
1. Enable DeltaCatalog as Spark catalog
2. Create Parquet tables with test data
3. Enable forceServerSidePlanning flag
4. Configure ServerSidePlanningTestClientFactory
5. Execute queries and verify results
6. Verify scan plan discovered files

Test Cases:
- E2E full stack integration with SELECT query
- E2E aggregation query (SUM, COUNT, GROUP BY)
- Normal path verification (feature disabled)

Assertions:
- Query results are correct
- Files are discovered via server-side planning
- Aggregations produce correct values
- Normal table loading works when feature disabled

This completes the test pyramid:
- PR #1: Test infrastructure (REST server)
- PR #2: Client unit tests
- PR #3: DSv2 Table unit and integration tests
- PR #4: DeltaCatalog integration (no new tests - minimal change)
- PR #5: Full stack E2E integration tests (this PR)

All functionality is now fully tested from unit to integration level.
@murali-db murali-db force-pushed the server-side-planning-2-client-interface branch from c49fd19 to f9a93e1 Compare November 3, 2025 05:21
@murali-db murali-db force-pushed the server-side-planning-3-dsv2-table-impl branch from 5b5e717 to fd30578 Compare November 3, 2025 05:21
murali-db added a commit that referenced this pull request Nov 3, 2025
This PR adds comprehensive integration tests that validate the entire
server-side planning stack from DeltaCatalog through to data reading.

Test Coverage:
- Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data
- SELECT query execution through server-side planning path
- Aggregation queries (SUM, COUNT, GROUP BY)
- Verification that normal path is unaffected when feature disabled

Test Strategy:
1. Enable DeltaCatalog as Spark catalog
2. Create Parquet tables with test data
3. Enable forceServerSidePlanning flag
4. Configure ServerSidePlanningTestClientFactory
5. Execute queries and verify results
6. Verify scan plan discovered files

Test Cases:
- E2E full stack integration with SELECT query
- E2E aggregation query (SUM, COUNT, GROUP BY)
- Normal path verification (feature disabled)

Assertions:
- Query results are correct
- Files are discovered via server-side planning
- Aggregations produce correct values
- Normal table loading works when feature disabled

This completes the test pyramid:
- PR #1: Test infrastructure (REST server)
- PR #2: Client unit tests
- PR #3: DSv2 Table unit and integration tests
- PR #4: DeltaCatalog integration (no new tests - minimal change)
- PR #5: Full stack E2E integration tests (this PR)

All functionality is now fully tested from unit to integration level.
@murali-db murali-db force-pushed the server-side-planning-3-dsv2-table-impl branch from fd30578 to 3abb6ec Compare November 3, 2025 05:27
murali-db added a commit that referenced this pull request Nov 3, 2025
This PR adds comprehensive integration tests that validate the entire
server-side planning stack from DeltaCatalog through to data reading.

Test Coverage:
- Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data
- SELECT query execution through server-side planning path
- Aggregation queries (SUM, COUNT, GROUP BY)
- Verification that normal path is unaffected when feature disabled

Test Strategy:
1. Enable DeltaCatalog as Spark catalog
2. Create Parquet tables with test data
3. Enable forceServerSidePlanning flag
4. Configure ServerSidePlanningTestClientFactory
5. Execute queries and verify results
6. Verify scan plan discovered files

Test Cases:
- E2E full stack integration with SELECT query
- E2E aggregation query (SUM, COUNT, GROUP BY)
- Normal path verification (feature disabled)

Assertions:
- Query results are correct
- Files are discovered via server-side planning
- Aggregations produce correct values
- Normal table loading works when feature disabled

This completes the test pyramid:
- PR #1: Test infrastructure (REST server)
- PR #2: Client unit tests
- PR #3: DSv2 Table unit and integration tests
- PR #4: DeltaCatalog integration (no new tests - minimal change)
- PR #5: Full stack E2E integration tests (this PR)

All functionality is now fully tested from unit to integration level.
@murali-db murali-db force-pushed the server-side-planning-2-client-interface branch from 2dcd2e8 to 43a3d0c Compare November 3, 2025 05:39
@murali-db murali-db force-pushed the server-side-planning-3-dsv2-table-impl branch from 3abb6ec to d9a5b07 Compare November 3, 2025 05:39
murali-db added a commit that referenced this pull request Nov 3, 2025
This PR adds comprehensive integration tests that validate the entire
server-side planning stack from DeltaCatalog through to data reading.

Test Coverage:
- Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data
- SELECT query execution through server-side planning path
- Aggregation queries (SUM, COUNT, GROUP BY)
- Verification that normal path is unaffected when feature disabled

Test Strategy:
1. Enable DeltaCatalog as Spark catalog
2. Create Parquet tables with test data
3. Enable forceServerSidePlanning flag
4. Configure ServerSidePlanningTestClientFactory
5. Execute queries and verify results
6. Verify scan plan discovered files

Test Cases:
- E2E full stack integration with SELECT query
- E2E aggregation query (SUM, COUNT, GROUP BY)
- Normal path verification (feature disabled)

Assertions:
- Query results are correct
- Files are discovered via server-side planning
- Aggregations produce correct values
- Normal table loading works when feature disabled

This completes the test pyramid:
- PR #1: Test infrastructure (REST server)
- PR #2: Client unit tests
- PR #3: DSv2 Table unit and integration tests
- PR #4: DeltaCatalog integration (no new tests - minimal change)
- PR #5: Full stack E2E integration tests (this PR)

All functionality is now fully tested from unit to integration level.
@murali-db murali-db force-pushed the server-side-planning-3-dsv2-table-impl branch from d9a5b07 to 755e14b Compare November 3, 2025 06:16
murali-db added a commit that referenced this pull request Nov 3, 2025
This PR adds comprehensive integration tests that validate the entire
server-side planning stack from DeltaCatalog through to data reading.

Test Coverage:
- Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data
- SELECT query execution through server-side planning path
- Aggregation queries (SUM, COUNT, GROUP BY)
- Verification that normal path is unaffected when feature disabled

Test Strategy:
1. Enable DeltaCatalog as Spark catalog
2. Create Parquet tables with test data
3. Enable forceServerSidePlanning flag
4. Configure ServerSidePlanningTestClientFactory
5. Execute queries and verify results
6. Verify scan plan discovered files

Test Cases:
- E2E full stack integration with SELECT query
- E2E aggregation query (SUM, COUNT, GROUP BY)
- Normal path verification (feature disabled)

Assertions:
- Query results are correct
- Files are discovered via server-side planning
- Aggregations produce correct values
- Normal table loading works when feature disabled

This completes the test pyramid:
- PR #1: Test infrastructure (REST server)
- PR #2: Client unit tests
- PR #3: DSv2 Table unit and integration tests
- PR #4: DeltaCatalog integration (no new tests - minimal change)
- PR #5: Full stack E2E integration tests (this PR)

All functionality is now fully tested from unit to integration level.
@murali-db murali-db force-pushed the server-side-planning-2-client-interface branch from 57e1430 to 37c120c Compare November 3, 2025 07:40
@murali-db murali-db force-pushed the server-side-planning-3-dsv2-table-impl branch from 755e14b to 673a081 Compare November 3, 2025 07:41
murali-db added a commit that referenced this pull request Nov 3, 2025
This PR adds comprehensive integration tests that validate the entire
server-side planning stack from DeltaCatalog through to data reading.

Test Coverage:
- Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data
- SELECT query execution through server-side planning path
- Aggregation queries (SUM, COUNT, GROUP BY)
- Verification that normal path is unaffected when feature disabled

Test Strategy:
1. Enable DeltaCatalog as Spark catalog
2. Create Parquet tables with test data
3. Enable forceServerSidePlanning flag
4. Configure ServerSidePlanningTestClientFactory
5. Execute queries and verify results
6. Verify scan plan discovered files

Test Cases:
- E2E full stack integration with SELECT query
- E2E aggregation query (SUM, COUNT, GROUP BY)
- Normal path verification (feature disabled)

Assertions:
- Query results are correct
- Files are discovered via server-side planning
- Aggregations produce correct values
- Normal table loading works when feature disabled

This completes the test pyramid:
- PR #1: Test infrastructure (REST server)
- PR #2: Client unit tests
- PR #3: DSv2 Table unit and integration tests
- PR #4: DeltaCatalog integration (no new tests - minimal change)
- PR #5: Full stack E2E integration tests (this PR)

All functionality is now fully tested from unit to integration level.
@murali-db murali-db force-pushed the server-side-planning-2-client-interface branch from 37c120c to dfc533e Compare November 3, 2025 07:43
@murali-db murali-db force-pushed the server-side-planning-3-dsv2-table-impl branch from 673a081 to 91c4414 Compare November 3, 2025 07:43
@murali-db murali-db force-pushed the server-side-planning-2-client-interface branch from dfc533e to cf88294 Compare November 3, 2025 07:51
@murali-db murali-db force-pushed the server-side-planning-3-dsv2-table-impl branch from 91c4414 to 9d514e9 Compare November 3, 2025 07:51
@murali-db murali-db force-pushed the server-side-planning-2-client-interface branch from cf88294 to 1f9a958 Compare November 4, 2025 16:56
@murali-db murali-db force-pushed the server-side-planning-3-dsv2-table-impl branch 2 times, most recently from 6a25ba6 to 819df83 Compare November 4, 2025 19:46
murali-db and others added 2 commits November 4, 2025 20:48
Implements DSv2 Table/Scan/Batch interfaces for server-side planning:
- ServerSidePlannedTable: DSv2 table backed by server-side planning client
- ServerSidePlannedScan/Batch: Scan and batch implementations
- ServerSidePlannedFilePartition: Custom InputPartition for file-based reads
- ServerSidePlanningTestClient: Test client using Spark for file discovery

This allows Delta to read tables using file lists from remote planning
services (e.g., Iceberg REST catalog) without local file discovery.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Rename 'namespace' parameter to 'database' throughout ServerSidePlannedTable
and its related classes to match the naming in ServerSidePlanningClient.planScan().

Also update tests to use the new parameter names and fix references to removed
schema field in ScanPlan and renamed factory class.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@murali-db murali-db force-pushed the server-side-planning-3-dsv2-table-impl branch from 819df83 to 47e6fa9 Compare November 4, 2025 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants