Skip to content

Conversation

@murali-db
Copy link
Owner

@murali-db murali-db commented Nov 2, 2025

Summary

This PR introduces the core client interface for server-side scan planning along with multiple implementations for different use cases (production
REST client, mocks for testing, and Spark-based client for integration tests).

Stack:

Changes

Core Interface (spark/.../serverSidePlanning/)

  • ServerSidePlanningClient: Main trait with planScan(namespace, table) method
  • ScanFile: Data class representing a file (path, size, format, partition data)
  • ScanPlan: Result containing files and schema JSON
  • ServerSidePlanningClientFactory: Factory pattern with registry for testing

Production Implementation (iceberg/.../serverSidePlanning/)

  • RESTServerSidePlanningClient:
    • Calls Iceberg REST catalog /v1/namespaces/{ns}/tables/{table}/plan endpoint
    • Uses HTTP client to communicate with catalog server
    • Converts Iceberg PlanTableScanResponse to simple ScanPlan data class
    • Uses reflection to parse responses (avoids compile-time Iceberg dependencies)

Test Implementations (spark/src/test/.../serverSidePlanning/)

  • MockServerSidePlanningClient: In-memory mock returning pre-configured file lists
  • ServerSidePlanningTestClient: Uses Spark SQL input_file_name() to discover files

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

How was this patch tested?

Does this PR introduce any user-facing changes?

murali-db added a commit that referenced this pull request Nov 2, 2025
This PR adds comprehensive integration tests that validate the entire
server-side planning stack from DeltaCatalog through to data reading.

Test Coverage:
- Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data
- SELECT query execution through server-side planning path
- Aggregation queries (SUM, COUNT, GROUP BY)
- Verification that normal path is unaffected when feature disabled

Test Strategy:
1. Enable DeltaCatalog as Spark catalog
2. Create Parquet tables with test data
3. Enable forceServerSidePlanning flag
4. Configure ServerSidePlanningTestClientFactory
5. Execute queries and verify results
6. Verify scan plan discovered files

Test Cases:
- E2E full stack integration with SELECT query
- E2E aggregation query (SUM, COUNT, GROUP BY)
- Normal path verification (feature disabled)

Assertions:
- Query results are correct
- Files are discovered via server-side planning
- Aggregations produce correct values
- Normal table loading works when feature disabled

This completes the test pyramid:
- PR #1: Test infrastructure (REST server)
- PR #2: Client unit tests
- PR #3: DSv2 Table unit and integration tests
- PR #4: DeltaCatalog integration (no new tests - minimal change)
- PR #5: Full stack E2E integration tests (this PR)

All functionality is now fully tested from unit to integration level.
@murali-db murali-db force-pushed the server-side-planning-2-client-interface branch from c49fd19 to f9a93e1 Compare November 3, 2025 05:21
murali-db added a commit that referenced this pull request Nov 3, 2025
This PR adds comprehensive integration tests that validate the entire
server-side planning stack from DeltaCatalog through to data reading.

Test Coverage:
- Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data
- SELECT query execution through server-side planning path
- Aggregation queries (SUM, COUNT, GROUP BY)
- Verification that normal path is unaffected when feature disabled

Test Strategy:
1. Enable DeltaCatalog as Spark catalog
2. Create Parquet tables with test data
3. Enable forceServerSidePlanning flag
4. Configure ServerSidePlanningTestClientFactory
5. Execute queries and verify results
6. Verify scan plan discovered files

Test Cases:
- E2E full stack integration with SELECT query
- E2E aggregation query (SUM, COUNT, GROUP BY)
- Normal path verification (feature disabled)

Assertions:
- Query results are correct
- Files are discovered via server-side planning
- Aggregations produce correct values
- Normal table loading works when feature disabled

This completes the test pyramid:
- PR #1: Test infrastructure (REST server)
- PR #2: Client unit tests
- PR #3: DSv2 Table unit and integration tests
- PR #4: DeltaCatalog integration (no new tests - minimal change)
- PR #5: Full stack E2E integration tests (this PR)

All functionality is now fully tested from unit to integration level.
murali-db added a commit that referenced this pull request Nov 3, 2025
This PR adds comprehensive integration tests that validate the entire
server-side planning stack from DeltaCatalog through to data reading.

Test Coverage:
- Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data
- SELECT query execution through server-side planning path
- Aggregation queries (SUM, COUNT, GROUP BY)
- Verification that normal path is unaffected when feature disabled

Test Strategy:
1. Enable DeltaCatalog as Spark catalog
2. Create Parquet tables with test data
3. Enable forceServerSidePlanning flag
4. Configure ServerSidePlanningTestClientFactory
5. Execute queries and verify results
6. Verify scan plan discovered files

Test Cases:
- E2E full stack integration with SELECT query
- E2E aggregation query (SUM, COUNT, GROUP BY)
- Normal path verification (feature disabled)

Assertions:
- Query results are correct
- Files are discovered via server-side planning
- Aggregations produce correct values
- Normal table loading works when feature disabled

This completes the test pyramid:
- PR #1: Test infrastructure (REST server)
- PR #2: Client unit tests
- PR #3: DSv2 Table unit and integration tests
- PR #4: DeltaCatalog integration (no new tests - minimal change)
- PR #5: Full stack E2E integration tests (this PR)

All functionality is now fully tested from unit to integration level.
@murali-db murali-db force-pushed the server-side-planning-2-client-interface branch from 2dcd2e8 to 43a3d0c Compare November 3, 2025 05:39
murali-db added a commit that referenced this pull request Nov 3, 2025
This PR adds comprehensive integration tests that validate the entire
server-side planning stack from DeltaCatalog through to data reading.

Test Coverage:
- Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data
- SELECT query execution through server-side planning path
- Aggregation queries (SUM, COUNT, GROUP BY)
- Verification that normal path is unaffected when feature disabled

Test Strategy:
1. Enable DeltaCatalog as Spark catalog
2. Create Parquet tables with test data
3. Enable forceServerSidePlanning flag
4. Configure ServerSidePlanningTestClientFactory
5. Execute queries and verify results
6. Verify scan plan discovered files

Test Cases:
- E2E full stack integration with SELECT query
- E2E aggregation query (SUM, COUNT, GROUP BY)
- Normal path verification (feature disabled)

Assertions:
- Query results are correct
- Files are discovered via server-side planning
- Aggregations produce correct values
- Normal table loading works when feature disabled

This completes the test pyramid:
- PR #1: Test infrastructure (REST server)
- PR #2: Client unit tests
- PR #3: DSv2 Table unit and integration tests
- PR #4: DeltaCatalog integration (no new tests - minimal change)
- PR #5: Full stack E2E integration tests (this PR)

All functionality is now fully tested from unit to integration level.
murali-db added a commit that referenced this pull request Nov 3, 2025
This PR adds comprehensive integration tests that validate the entire
server-side planning stack from DeltaCatalog through to data reading.

Test Coverage:
- Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data
- SELECT query execution through server-side planning path
- Aggregation queries (SUM, COUNT, GROUP BY)
- Verification that normal path is unaffected when feature disabled

Test Strategy:
1. Enable DeltaCatalog as Spark catalog
2. Create Parquet tables with test data
3. Enable forceServerSidePlanning flag
4. Configure ServerSidePlanningTestClientFactory
5. Execute queries and verify results
6. Verify scan plan discovered files

Test Cases:
- E2E full stack integration with SELECT query
- E2E aggregation query (SUM, COUNT, GROUP BY)
- Normal path verification (feature disabled)

Assertions:
- Query results are correct
- Files are discovered via server-side planning
- Aggregations produce correct values
- Normal table loading works when feature disabled

This completes the test pyramid:
- PR #1: Test infrastructure (REST server)
- PR #2: Client unit tests
- PR #3: DSv2 Table unit and integration tests
- PR #4: DeltaCatalog integration (no new tests - minimal change)
- PR #5: Full stack E2E integration tests (this PR)

All functionality is now fully tested from unit to integration level.
@murali-db murali-db force-pushed the server-side-planning-2-client-interface branch from 57e1430 to 37c120c Compare November 3, 2025 07:40
murali-db added a commit that referenced this pull request Nov 3, 2025
This PR adds comprehensive integration tests that validate the entire
server-side planning stack from DeltaCatalog through to data reading.

Test Coverage:
- Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data
- SELECT query execution through server-side planning path
- Aggregation queries (SUM, COUNT, GROUP BY)
- Verification that normal path is unaffected when feature disabled

Test Strategy:
1. Enable DeltaCatalog as Spark catalog
2. Create Parquet tables with test data
3. Enable forceServerSidePlanning flag
4. Configure ServerSidePlanningTestClientFactory
5. Execute queries and verify results
6. Verify scan plan discovered files

Test Cases:
- E2E full stack integration with SELECT query
- E2E aggregation query (SUM, COUNT, GROUP BY)
- Normal path verification (feature disabled)

Assertions:
- Query results are correct
- Files are discovered via server-side planning
- Aggregations produce correct values
- Normal table loading works when feature disabled

This completes the test pyramid:
- PR #1: Test infrastructure (REST server)
- PR #2: Client unit tests
- PR #3: DSv2 Table unit and integration tests
- PR #4: DeltaCatalog integration (no new tests - minimal change)
- PR #5: Full stack E2E integration tests (this PR)

All functionality is now fully tested from unit to integration level.
@murali-db murali-db force-pushed the server-side-planning-1-rest-server-test-infra branch from a19cb4d to da89365 Compare November 3, 2025 07:43
@murali-db murali-db force-pushed the server-side-planning-2-client-interface branch from 37c120c to dfc533e Compare November 3, 2025 07:43
@murali-db murali-db force-pushed the server-side-planning-1-rest-server-test-infra branch from da89365 to c42f47e Compare November 3, 2025 07:50
@murali-db murali-db force-pushed the server-side-planning-2-client-interface branch from dfc533e to cf88294 Compare November 3, 2025 07:51
Adds the client interface and factory for server-side planning:
- ServerSidePlanningClient: Trait defining the planning contract
- IcebergServerSidePlanningClient: REST implementation for Iceberg catalogs
- IcebergServerSidePlanningClientFactory: Factory using reflection
- ScanPlan/ScanFile: Simple data classes with no Iceberg dependencies
- ServerSidePlanningClientSuite: Integration tests with IcebergRESTServer

The interface lives in delta-spark module to avoid Iceberg dependencies,
while the implementation lives in iceberg module where dependencies exist.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@murali-db murali-db force-pushed the server-side-planning-2-client-interface branch from cf88294 to 1f9a958 Compare November 4, 2025 16:56
murali-db and others added 3 commits November 4, 2025 19:09
Changes made:
- Rename IcebergServerSidePlanningClient to IcebergRESTCatalogPlanningClient
- Rename IcebergServerSidePlanningClientFactory to IcebergRESTCatalogPlanningClientFactory
- Remove schema field from ScanPlan (not in Iceberg REST API spec)
- Remove partitionData field from ScanFile and add validation
- Rename parameter 'namespace' to 'database' throughout
- Remove FORCE_SERVER_SIDE_PLANNING config (moved to PR #5)
- Simplify ServerSidePlanningTestClient (remove cloning/config logic)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This allows clients to be created with catalog-specific configuration
by reading from spark.sql.catalog.<catalogName>.uri and token.

Following the pattern used by UCCommitCoordinatorBuilder, this enables
proper integration with Spark's catalog configuration system instead of
hardcoding configuration keys.

Changes:
- Add buildForCatalog(spark, catalogName) to ServerSidePlanningClientFactory trait
- Implement in IcebergRESTCatalogPlanningClientFactory to read catalog configs
- Implement in MockServerSidePlanningClientFactory for testing
- Implement in ServerSidePlanningTestClientFactory for testing
- Add convenience method to ServerSidePlanningClientFactory object

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Fix scalastyle error by using toLowerCase(Locale.ROOT) instead of
toLowerCase() in ServerSidePlanningTestClient.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants