Skip to content

Conversation

@murali-db
Copy link
Owner

Stack:

This PR wires ServerSidePlannedTable into DeltaCatalog's loadTable() method, enabling the fallback path when Unity Catalog tables lack credentials.

Changes to DeltaCatalog:

  • Added import for ServerSidePlanningClientFactory
  • Modified loadTable() to check for missing credentials or force flag
  • Added hasCredentials() helper to detect credential presence
  • Returns ServerSidePlannedTable instead of normal table when needed

Logic Flow:

  1. Load table normally via super.loadTable()
  2. Check if forceServerSidePlanning config is enabled (for testing)
  3. Check if Unity Catalog table lacks credentials (production use case)
  4. If either condition is true:
    • Get client from ServerSidePlanningClientFactory
    • Return ServerSidePlannedTable with server-side scan planning
  5. Otherwise, use normal Delta table loading path

Feature Flag:

  • Config: spark.databricks.delta.catalog.forceServerSidePlanning
  • Default: false
  • Purpose: Testing and gradual rollout

Credential Detection:

  • Checks table properties for common credential keys
  • Keys: storage.credential, aws/azure/gcs.temporary.credentials
  • Returns false if no credential properties found

Safety:

  • Feature is behind config flag - no impact unless enabled
  • Fallback only applies to Unity Catalog tables
  • Normal Delta table loading path unchanged
  • Early return prevents double table wrapping

This is the minimal integration point - actual usage requires setting the force flag or having UC tables without credentials.

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

How was this patch tested?

Does this PR introduce any user-facing changes?

murali-db added a commit that referenced this pull request Nov 2, 2025
This PR adds comprehensive integration tests that validate the entire
server-side planning stack from DeltaCatalog through to data reading.

Test Coverage:
- Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data
- SELECT query execution through server-side planning path
- Aggregation queries (SUM, COUNT, GROUP BY)
- Verification that normal path is unaffected when feature disabled

Test Strategy:
1. Enable DeltaCatalog as Spark catalog
2. Create Parquet tables with test data
3. Enable forceServerSidePlanning flag
4. Configure ServerSidePlanningTestClientFactory
5. Execute queries and verify results
6. Verify scan plan discovered files

Test Cases:
- E2E full stack integration with SELECT query
- E2E aggregation query (SUM, COUNT, GROUP BY)
- Normal path verification (feature disabled)

Assertions:
- Query results are correct
- Files are discovered via server-side planning
- Aggregations produce correct values
- Normal table loading works when feature disabled

This completes the test pyramid:
- PR #1: Test infrastructure (REST server)
- PR #2: Client unit tests
- PR #3: DSv2 Table unit and integration tests
- PR #4: DeltaCatalog integration (no new tests - minimal change)
- PR #5: Full stack E2E integration tests (this PR)

All functionality is now fully tested from unit to integration level.
@murali-db murali-db force-pushed the server-side-planning-3-dsv2-table-impl branch from 5b5e717 to fd30578 Compare November 3, 2025 05:21
@murali-db murali-db force-pushed the server-side-planning-4-catalog-integration branch from 1fb4a24 to e51a537 Compare November 3, 2025 05:22
murali-db added a commit that referenced this pull request Nov 3, 2025
This PR adds comprehensive integration tests that validate the entire
server-side planning stack from DeltaCatalog through to data reading.

Test Coverage:
- Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data
- SELECT query execution through server-side planning path
- Aggregation queries (SUM, COUNT, GROUP BY)
- Verification that normal path is unaffected when feature disabled

Test Strategy:
1. Enable DeltaCatalog as Spark catalog
2. Create Parquet tables with test data
3. Enable forceServerSidePlanning flag
4. Configure ServerSidePlanningTestClientFactory
5. Execute queries and verify results
6. Verify scan plan discovered files

Test Cases:
- E2E full stack integration with SELECT query
- E2E aggregation query (SUM, COUNT, GROUP BY)
- Normal path verification (feature disabled)

Assertions:
- Query results are correct
- Files are discovered via server-side planning
- Aggregations produce correct values
- Normal table loading works when feature disabled

This completes the test pyramid:
- PR #1: Test infrastructure (REST server)
- PR #2: Client unit tests
- PR #3: DSv2 Table unit and integration tests
- PR #4: DeltaCatalog integration (no new tests - minimal change)
- PR #5: Full stack E2E integration tests (this PR)

All functionality is now fully tested from unit to integration level.
@murali-db murali-db force-pushed the server-side-planning-3-dsv2-table-impl branch from fd30578 to 3abb6ec Compare November 3, 2025 05:27
@murali-db murali-db force-pushed the server-side-planning-4-catalog-integration branch from e51a537 to 169526b Compare November 3, 2025 05:28
murali-db added a commit that referenced this pull request Nov 3, 2025
This PR adds comprehensive integration tests that validate the entire
server-side planning stack from DeltaCatalog through to data reading.

Test Coverage:
- Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data
- SELECT query execution through server-side planning path
- Aggregation queries (SUM, COUNT, GROUP BY)
- Verification that normal path is unaffected when feature disabled

Test Strategy:
1. Enable DeltaCatalog as Spark catalog
2. Create Parquet tables with test data
3. Enable forceServerSidePlanning flag
4. Configure ServerSidePlanningTestClientFactory
5. Execute queries and verify results
6. Verify scan plan discovered files

Test Cases:
- E2E full stack integration with SELECT query
- E2E aggregation query (SUM, COUNT, GROUP BY)
- Normal path verification (feature disabled)

Assertions:
- Query results are correct
- Files are discovered via server-side planning
- Aggregations produce correct values
- Normal table loading works when feature disabled

This completes the test pyramid:
- PR #1: Test infrastructure (REST server)
- PR #2: Client unit tests
- PR #3: DSv2 Table unit and integration tests
- PR #4: DeltaCatalog integration (no new tests - minimal change)
- PR #5: Full stack E2E integration tests (this PR)

All functionality is now fully tested from unit to integration level.
@murali-db murali-db force-pushed the server-side-planning-3-dsv2-table-impl branch from 3abb6ec to d9a5b07 Compare November 3, 2025 05:39
@murali-db murali-db force-pushed the server-side-planning-4-catalog-integration branch from 169526b to 2db6ed4 Compare November 3, 2025 05:39
murali-db added a commit that referenced this pull request Nov 3, 2025
This PR adds comprehensive integration tests that validate the entire
server-side planning stack from DeltaCatalog through to data reading.

Test Coverage:
- Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data
- SELECT query execution through server-side planning path
- Aggregation queries (SUM, COUNT, GROUP BY)
- Verification that normal path is unaffected when feature disabled

Test Strategy:
1. Enable DeltaCatalog as Spark catalog
2. Create Parquet tables with test data
3. Enable forceServerSidePlanning flag
4. Configure ServerSidePlanningTestClientFactory
5. Execute queries and verify results
6. Verify scan plan discovered files

Test Cases:
- E2E full stack integration with SELECT query
- E2E aggregation query (SUM, COUNT, GROUP BY)
- Normal path verification (feature disabled)

Assertions:
- Query results are correct
- Files are discovered via server-side planning
- Aggregations produce correct values
- Normal table loading works when feature disabled

This completes the test pyramid:
- PR #1: Test infrastructure (REST server)
- PR #2: Client unit tests
- PR #3: DSv2 Table unit and integration tests
- PR #4: DeltaCatalog integration (no new tests - minimal change)
- PR #5: Full stack E2E integration tests (this PR)

All functionality is now fully tested from unit to integration level.
@murali-db murali-db force-pushed the server-side-planning-3-dsv2-table-impl branch from d9a5b07 to 755e14b Compare November 3, 2025 06:16
@murali-db murali-db force-pushed the server-side-planning-4-catalog-integration branch from 2db6ed4 to a624bdf Compare November 3, 2025 06:16
murali-db added a commit that referenced this pull request Nov 3, 2025
This PR adds comprehensive integration tests that validate the entire
server-side planning stack from DeltaCatalog through to data reading.

Test Coverage:
- Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data
- SELECT query execution through server-side planning path
- Aggregation queries (SUM, COUNT, GROUP BY)
- Verification that normal path is unaffected when feature disabled

Test Strategy:
1. Enable DeltaCatalog as Spark catalog
2. Create Parquet tables with test data
3. Enable forceServerSidePlanning flag
4. Configure ServerSidePlanningTestClientFactory
5. Execute queries and verify results
6. Verify scan plan discovered files

Test Cases:
- E2E full stack integration with SELECT query
- E2E aggregation query (SUM, COUNT, GROUP BY)
- Normal path verification (feature disabled)

Assertions:
- Query results are correct
- Files are discovered via server-side planning
- Aggregations produce correct values
- Normal table loading works when feature disabled

This completes the test pyramid:
- PR #1: Test infrastructure (REST server)
- PR #2: Client unit tests
- PR #3: DSv2 Table unit and integration tests
- PR #4: DeltaCatalog integration (no new tests - minimal change)
- PR #5: Full stack E2E integration tests (this PR)

All functionality is now fully tested from unit to integration level.
@murali-db murali-db force-pushed the server-side-planning-3-dsv2-table-impl branch from 755e14b to 673a081 Compare November 3, 2025 07:41
@murali-db murali-db force-pushed the server-side-planning-4-catalog-integration branch from a624bdf to 1a7895e Compare November 3, 2025 07:41
murali-db added a commit that referenced this pull request Nov 3, 2025
This PR adds comprehensive integration tests that validate the entire
server-side planning stack from DeltaCatalog through to data reading.

Test Coverage:
- Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data
- SELECT query execution through server-side planning path
- Aggregation queries (SUM, COUNT, GROUP BY)
- Verification that normal path is unaffected when feature disabled

Test Strategy:
1. Enable DeltaCatalog as Spark catalog
2. Create Parquet tables with test data
3. Enable forceServerSidePlanning flag
4. Configure ServerSidePlanningTestClientFactory
5. Execute queries and verify results
6. Verify scan plan discovered files

Test Cases:
- E2E full stack integration with SELECT query
- E2E aggregation query (SUM, COUNT, GROUP BY)
- Normal path verification (feature disabled)

Assertions:
- Query results are correct
- Files are discovered via server-side planning
- Aggregations produce correct values
- Normal table loading works when feature disabled

This completes the test pyramid:
- PR #1: Test infrastructure (REST server)
- PR #2: Client unit tests
- PR #3: DSv2 Table unit and integration tests
- PR #4: DeltaCatalog integration (no new tests - minimal change)
- PR #5: Full stack E2E integration tests (this PR)

All functionality is now fully tested from unit to integration level.
@murali-db murali-db force-pushed the server-side-planning-3-dsv2-table-impl branch from 673a081 to 91c4414 Compare November 3, 2025 07:43
@murali-db murali-db force-pushed the server-side-planning-4-catalog-integration branch from 1a7895e to bc47741 Compare November 3, 2025 07:43
@murali-db murali-db force-pushed the server-side-planning-3-dsv2-table-impl branch from 91c4414 to 9d514e9 Compare November 3, 2025 07:51
@murali-db murali-db force-pushed the server-side-planning-4-catalog-integration branch from bc47741 to 7191318 Compare November 3, 2025 07:51
@murali-db murali-db force-pushed the server-side-planning-3-dsv2-table-impl branch from 9d514e9 to 6a25ba6 Compare November 4, 2025 16:57
@murali-db murali-db force-pushed the server-side-planning-4-catalog-integration branch from 7191318 to 0463f26 Compare November 4, 2025 16:57
murali-db added a commit that referenced this pull request Nov 4, 2025
Changes made:
- Rename IcebergServerSidePlanningClient to IcebergRESTCatalogPlanningClient
- Rename IcebergServerSidePlanningClientFactory to IcebergRESTCatalogPlanningClientFactory
- Remove schema field from ScanPlan (not in Iceberg REST API spec)
- Remove partitionData field from ScanFile and add validation
- Rename parameter 'namespace' to 'database' throughout
- Remove FORCE_SERVER_SIDE_PLANNING config (moved to PR #5)
- Simplify ServerSidePlanningTestClient (remove cloning/config logic)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@murali-db murali-db force-pushed the server-side-planning-3-dsv2-table-impl branch from 6a25ba6 to 819df83 Compare November 4, 2025 19:46
@murali-db murali-db force-pushed the server-side-planning-4-catalog-integration branch from 0463f26 to 1ee15f2 Compare November 4, 2025 19:46
@murali-db murali-db force-pushed the server-side-planning-3-dsv2-table-impl branch from 819df83 to 47e6fa9 Compare November 4, 2025 20:48
murali-db and others added 3 commits November 4, 2025 20:48
Adds catalog integration to enable server-side planning:
- DeltaCatalog now checks for spark.delta.serverSidePlanning.enabled config
- When enabled, returns ServerSidePlannedTable instead of DeltaTableV2
- Uses ServerSidePlanningClientFactory to create appropriate client
- Supports both Iceberg REST and mock/test implementations

This completes the plumbing from catalog through DSv2 to client.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Update DeltaCatalog.loadTable() to use buildForCatalog instead of
createClient, extracting the catalog name from the identifier's namespace.

This enables proper catalog-specific configuration reading from
spark.sql.catalog.<catalogName>.uri and token.

For fully qualified identifiers like catalog.database.table, the catalog
name is extracted from the first element of the namespace. Otherwise,
defaults to "spark_catalog".

Also adds ENABLE_SERVER_SIDE_PLANNING config to DeltaSQLConf (renamed from
FORCE_SERVER_SIDE_PLANNING to better reflect its purpose).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@murali-db murali-db force-pushed the server-side-planning-4-catalog-integration branch from 1ee15f2 to ca46cc0 Compare November 4, 2025 20:48
murali-db added a commit that referenced this pull request Nov 11, 2025
Addresses code review issues #5, #7, #9, and #18:

1. Issue #7: Expand Hadoop configuration limitation documentation
   - Clarify production impact of using sessionState.newHadoopConf()
   - Provide concrete examples of what won't work
   - Document workaround for per-query credentials
   - Link to architectural decision about DeltaLog dependency

2. Issue #18: Document DeltaLog dependency avoidance
   - Add comprehensive class-level documentation for ServerSidePlannedTable
   - Explain format independence, lightweight design, and clean architecture
   - Document trade-offs and alternative approaches

3. Issue #9: Improve catalog name extraction documentation
   - Add detailed examples for all identifier formats
   - Explain edge cases (fully qualified vs database-only vs table-only)
   - Clarify why we check namespace().length > 1

4. Issue #5: Add TODO for hasCredentials() test coverage
   - Document what test would be helpful to add
   - Suggest implementation approaches (reflection vs custom catalog)
   - Note challenge of testing without real Unity Catalog

All tests pass. No functional changes, only documentation improvements.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants