-
Notifications
You must be signed in to change notification settings - Fork 0
#5 Introducing Deletion Vector support #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: refactor-validator
Are you sure you want to change the base?
Conversation
writer compat v3 addressing first round of comments addressing comments adding tests tests header
20b4528 to
e1b16b0
Compare
e1b16b0 to
cfb93c5
Compare
| Map<String, Literal> partitionValues, | ||
| boolean dataChange, | ||
| Map<String, String> tags, | ||
| DeletionVectorDescriptor deletionVectorDescriptor) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make the existing API take the deletionVectorDescription Optional. Given this is not a public API, we don't need to worry about the existing users and migration is simple anyways.
| return SingleAction.createAddFileSingleAction(addFile.toRow()); | ||
| } | ||
|
|
||
| public static Row generateIcebergCompatWriterV3AddAction( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for this API.
|
|
||
| Optional<DeletionVectorDescriptor> deletionVectorOpt = | ||
| deletionVectorDescriptor != null ? Optional.of(deletionVectorDescriptor) : Optional.empty(); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to do any validations when a DV is added? for example deletion vectors feature is enabled and active?
<!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [x] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) ## Description This PR refactors CC relevant code. 1. Fix indentation. 2. Correctly override `catalogOwnedCoordinatorBackfillBatchSize` that extends `CatalogOwnedTestBaseSuite`. 3. Consolidate the two checkpoints helpers to `CatalogOwnedTestBaseSuite`. 4. Refactor `CatalogOwnedEnablementSuite`. <!-- - Describe what this PR changes. - Describe why we need the change. If this PR resolves an issue be sure to include "Resolves #XXX" to correctly link and close the issue upon merge. --> ## How was this patch tested? Through existing suites. <!-- If tests were added, say they were added here. Please make sure to test the changes thoroughly including negative and positive cases if possible. If the changes were tested in any way other than unit tests, please clarify how you tested step by step (ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future). If the changes were not tested, please explain why. --> ## Does this PR introduce _any_ user-facing changes? No. <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Delta Lake versions or within the unreleased branches such as master. If no, write 'No'. -->
…ta-io#4731) <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [ ] Spark - [ ] Standalone - [ ] Flink - [x] Kernel - [ ] Other (fill in here) ## Description <!-- - Describe what this PR changes. - Describe why we need the change. If this PR resolves an issue be sure to include "Resolves #XXX" to correctly link and close the issue upon merge. --> Add the IcebergCompatV3 table feature, which requires the columnMapping and rowTracking to be enabled. ## How was this patch tested? <!-- If tests were added, say they were added here. Please make sure to test the changes thoroughly including negative and positive cases if possible. If the changes were tested in any way other than unit tests, please clarify how you tested step by step (ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future). If the changes were not tested, please explain why. --> Unit tests ## Does this PR introduce _any_ user-facing changes? <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Delta Lake versions or within the unreleased branches such as master. If no, write 'No'. -->
…mpat v3 (delta-io#4733) <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [x] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) ## Description <!-- - Describe what this PR changes. - Describe why we need the change. If this PR resolves an issue be sure to include "Resolves #XXX" to correctly link and close the issue upon merge. --> Add IcebergWriterCompatV3 table feature and unit tests. ## How was this patch tested? <!-- If tests were added, say they were added here. Please make sure to test the changes thoroughly including negative and positive cases if possible. If the changes were tested in any way other than unit tests, please clarify how you tested step by step (ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future). If the changes were not tested, please explain why. --> Unit tests. ## Does this PR introduce _any_ user-facing changes? <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Delta Lake versions or within the unreleased branches such as master. If no, write 'No'. -->
This PR adds comprehensive integration tests that validate the entire server-side planning stack from DeltaCatalog through to data reading. Test Coverage: - Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data - SELECT query execution through server-side planning path - Aggregation queries (SUM, COUNT, GROUP BY) - Verification that normal path is unaffected when feature disabled Test Strategy: 1. Enable DeltaCatalog as Spark catalog 2. Create Parquet tables with test data 3. Enable forceServerSidePlanning flag 4. Configure ServerSidePlanningTestClientFactory 5. Execute queries and verify results 6. Verify scan plan discovered files Test Cases: - E2E full stack integration with SELECT query - E2E aggregation query (SUM, COUNT, GROUP BY) - Normal path verification (feature disabled) Assertions: - Query results are correct - Files are discovered via server-side planning - Aggregations produce correct values - Normal table loading works when feature disabled This completes the test pyramid: - PR #1: Test infrastructure (REST server) - PR #2: Client unit tests - PR #3: DSv2 Table unit and integration tests - PR #4: DeltaCatalog integration (no new tests - minimal change) - PR #5: Full stack E2E integration tests (this PR) All functionality is now fully tested from unit to integration level.
This PR adds comprehensive integration tests that validate the entire server-side planning stack from DeltaCatalog through to data reading. Test Coverage: - Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data - SELECT query execution through server-side planning path - Aggregation queries (SUM, COUNT, GROUP BY) - Verification that normal path is unaffected when feature disabled Test Strategy: 1. Enable DeltaCatalog as Spark catalog 2. Create Parquet tables with test data 3. Enable forceServerSidePlanning flag 4. Configure ServerSidePlanningTestClientFactory 5. Execute queries and verify results 6. Verify scan plan discovered files Test Cases: - E2E full stack integration with SELECT query - E2E aggregation query (SUM, COUNT, GROUP BY) - Normal path verification (feature disabled) Assertions: - Query results are correct - Files are discovered via server-side planning - Aggregations produce correct values - Normal table loading works when feature disabled This completes the test pyramid: - PR #1: Test infrastructure (REST server) - PR #2: Client unit tests - PR #3: DSv2 Table unit and integration tests - PR #4: DeltaCatalog integration (no new tests - minimal change) - PR #5: Full stack E2E integration tests (this PR) All functionality is now fully tested from unit to integration level.
This PR adds comprehensive integration tests that validate the entire server-side planning stack from DeltaCatalog through to data reading. Test Coverage: - Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data - SELECT query execution through server-side planning path - Aggregation queries (SUM, COUNT, GROUP BY) - Verification that normal path is unaffected when feature disabled Test Strategy: 1. Enable DeltaCatalog as Spark catalog 2. Create Parquet tables with test data 3. Enable forceServerSidePlanning flag 4. Configure ServerSidePlanningTestClientFactory 5. Execute queries and verify results 6. Verify scan plan discovered files Test Cases: - E2E full stack integration with SELECT query - E2E aggregation query (SUM, COUNT, GROUP BY) - Normal path verification (feature disabled) Assertions: - Query results are correct - Files are discovered via server-side planning - Aggregations produce correct values - Normal table loading works when feature disabled This completes the test pyramid: - PR #1: Test infrastructure (REST server) - PR #2: Client unit tests - PR #3: DSv2 Table unit and integration tests - PR #4: DeltaCatalog integration (no new tests - minimal change) - PR #5: Full stack E2E integration tests (this PR) All functionality is now fully tested from unit to integration level.
These files should remain as-is from the Iceberg 1.10.0 upgrade. PR #1 should only add REST catalog test infrastructure. Reverted changes: - TableMetadata.java: Keep original comments and SUPPORTED_TABLE_FORMAT_VERSION=4 - MetadataUpdate.java: Keep original HACK-HACK comments 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This PR adds comprehensive integration tests that validate the entire server-side planning stack from DeltaCatalog through to data reading. Test Coverage: - Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data - SELECT query execution through server-side planning path - Aggregation queries (SUM, COUNT, GROUP BY) - Verification that normal path is unaffected when feature disabled Test Strategy: 1. Enable DeltaCatalog as Spark catalog 2. Create Parquet tables with test data 3. Enable forceServerSidePlanning flag 4. Configure ServerSidePlanningTestClientFactory 5. Execute queries and verify results 6. Verify scan plan discovered files Test Cases: - E2E full stack integration with SELECT query - E2E aggregation query (SUM, COUNT, GROUP BY) - Normal path verification (feature disabled) Assertions: - Query results are correct - Files are discovered via server-side planning - Aggregations produce correct values - Normal table loading works when feature disabled This completes the test pyramid: - PR #1: Test infrastructure (REST server) - PR #2: Client unit tests - PR #3: DSv2 Table unit and integration tests - PR #4: DeltaCatalog integration (no new tests - minimal change) - PR #5: Full stack E2E integration tests (this PR) All functionality is now fully tested from unit to integration level.
This PR adds comprehensive integration tests that validate the entire server-side planning stack from DeltaCatalog through to data reading. Test Coverage: - Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data - SELECT query execution through server-side planning path - Aggregation queries (SUM, COUNT, GROUP BY) - Verification that normal path is unaffected when feature disabled Test Strategy: 1. Enable DeltaCatalog as Spark catalog 2. Create Parquet tables with test data 3. Enable forceServerSidePlanning flag 4. Configure ServerSidePlanningTestClientFactory 5. Execute queries and verify results 6. Verify scan plan discovered files Test Cases: - E2E full stack integration with SELECT query - E2E aggregation query (SUM, COUNT, GROUP BY) - Normal path verification (feature disabled) Assertions: - Query results are correct - Files are discovered via server-side planning - Aggregations produce correct values - Normal table loading works when feature disabled This completes the test pyramid: - PR #1: Test infrastructure (REST server) - PR #2: Client unit tests - PR #3: DSv2 Table unit and integration tests - PR #4: DeltaCatalog integration (no new tests - minimal change) - PR #5: Full stack E2E integration tests (this PR) All functionality is now fully tested from unit to integration level.
This PR adds comprehensive integration tests that validate the entire server-side planning stack from DeltaCatalog through to data reading. Test Coverage: - Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data - SELECT query execution through server-side planning path - Aggregation queries (SUM, COUNT, GROUP BY) - Verification that normal path is unaffected when feature disabled Test Strategy: 1. Enable DeltaCatalog as Spark catalog 2. Create Parquet tables with test data 3. Enable forceServerSidePlanning flag 4. Configure ServerSidePlanningTestClientFactory 5. Execute queries and verify results 6. Verify scan plan discovered files Test Cases: - E2E full stack integration with SELECT query - E2E aggregation query (SUM, COUNT, GROUP BY) - Normal path verification (feature disabled) Assertions: - Query results are correct - Files are discovered via server-side planning - Aggregations produce correct values - Normal table loading works when feature disabled This completes the test pyramid: - PR #1: Test infrastructure (REST server) - PR #2: Client unit tests - PR #3: DSv2 Table unit and integration tests - PR #4: DeltaCatalog integration (no new tests - minimal change) - PR #5: Full stack E2E integration tests (this PR) All functionality is now fully tested from unit to integration level.
…an show the parameters. (delta-io#5817) #### Which Delta project/connector is this regarding? - [X] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) ## Description Convert @ParameterizedTest to dynamic tests so that SBT can show the parameters. Also upgrade the JUnit related packages so that the display name can actually show up in SBT output. This is to address comment: delta-io#5772 (comment) Before this PR it shows: ``` [info] Test io.sparkuctest.UCDeltaTableDMLTest#testBasicInsertOperations(io.sparkuctest.UCDeltaTableIntegrationBaseTest$TableType):#1 started [info] Test io.sparkuctest.UCDeltaTableDMLTest#testBasicInsertOperations(io.sparkuctest.UCDeltaTableIntegrationBaseTest$TableType):#2 started [info] Test io.sparkuctest.UCDeltaTableDMLTest#testMergeWithDeleteAction(io.sparkuctest.UCDeltaTableIntegrationBaseTest$TableType):#1 started [info] Test io.sparkuctest.UCDeltaTableDMLTest#testMergeWithDeleteAction(io.sparkuctest.UCDeltaTableIntegrationBaseTest$TableType):#2 started ... [info] Test io.sparkuctest.UCDeltaTableCreationTest#testCreateTable(io.sparkuctest.UCDeltaTableIntegrationBaseTest$TableType, boolean, boolean, boolean, boolean):#1 started [info] Test io.sparkuctest.UCDeltaTableCreationTest#testCreateTable(io.sparkuctest.UCDeltaTableIntegrationBaseTest$TableType, boolean, boolean, boolean, boolean):#2 started ``` After this PR it shows: ``` [info] Test io.sparkuctest.UCDeltaTableDMLTest#allTableTypesTestsFactory() #1:testUpdateOperations(EXTERNAL) started [info] Test io.sparkuctest.UCDeltaTableDMLTest#allTableTypesTestsFactory() #1:testUpdateOperations(MANAGED) started [info] Test io.sparkuctest.UCDeltaTableDMLTest#allTableTypesTestsFactory() #2:testDeleteOperations(EXTERNAL) started [info] Test io.sparkuctest.UCDeltaTableDMLTest#allTableTypesTestsFactory() #2:testDeleteOperations(MANAGED) started ... [info] Test io.sparkuctest.UCDeltaTableCreationTest#testCreateTable():tableType=EXTERNAL, withPartition=true, withCluster=false, withAsSelect=true, replaceTable=true started [info] Test io.sparkuctest.UCDeltaTableCreationTest#testCreateTable():tableType=EXTERNAL, withPartition=true, withCluster=false, withAsSelect=true, replaceTable=false started ``` ## How was this patch tested? ## Does this PR introduce _any_ user-facing changes? No. --------- Signed-off-by: Yi Li <yi.li@databricks.com>
#### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [ ] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) ## Description Added a new UC E2E streaming test suite to cover Delta structured streaming read/write behavior across managed/external tables, including UC commit version validation for managed tables. The changes live in spark/unitycatalog/src/test/java/io/sparkuctest/ UCDeltaStreamingTest.java to exercise MemoryStream writes, streaming reads in V2 strict mode, and checkpointed queries. <!--⚠️ ⚠️ **Note to reviewers**⚠️ ⚠️ this branch is based on delta-io#5651; therefore there are a number of changes that are not directly related to the E2E test integration for streaming reads (those changes are to enable these streaming reads). Please just review [spark/unitycatalog/src/test/java/io/sparkuctest/UCDeltaStreamingTest.java](https://github.com/delta-io/delta/pull/5833/changes#diff-04879565d272ba02d9c1e47707ec9bfdb1044460e957ec769d1914178554383b) --> ## How was this patch tested? Locally: ``` export UC_REMOTE=false $ build/sbt "sparkUnityCatalog/testOnly io.sparkuctest.UCDeltaStreamingTest" ... [info] Test run started (JUnit Jupiter) [info] Test io.sparkuctest.UCDeltaStreamingTest#allTableTypesTestsFactory() #1:testStreamingWriteToManagedTable(EXTERNAL) started [info] Test io.sparkuctest.UCDeltaStreamingTest#allTableTypesTestsFactory() #1:testStreamingWriteToManagedTable(MANAGED) started [info] Test io.sparkuctest.UCDeltaStreamingTest#allTableTypesTestsFactory() #2:testStreamingReadFromTable(EXTERNAL) started [info] Test io.sparkuctest.UCDeltaStreamingTest#allTableTypesTestsFactory() #2:testStreamingReadFromTable(MANAGED) started [info] Test run finished: 0 failed, 0 ignored, 4 total, 24.65s [info] Passed: Total 4, Failed 0, Errors 0, Passed 4 [success] Total time: 44 s, completed Jan 16, 2026, 12:17:45 AM ``` and tested against remote UC server ``` export UC_REMOTE=true export UC_URI=$UC_URI export UC_TOKEN=$UC_TOKEN export UC_CATALOG_NAME=main export UC_SCHEMA_NAME=demo_zh export UC_BASE_TABLE_LOCATION=$S3_BASE_LOCATION $ build/sbt "sparkUnityCatalog/testOnly io.sparkuctest.UCDeltaStreamingTest" ... [info] Test run started (JUnit Jupiter) [info] Test io.sparkuctest.UCDeltaStreamingTest#allTableTypesTestsFactory() #1:testStreamingWriteToManagedTable(EXTERNAL) started [info] Test io.sparkuctest.UCDeltaStreamingTest#allTableTypesTestsFactory() #1:testStreamingWriteToManagedTable(MANAGED) started [info] Test io.sparkuctest.UCDeltaStreamingTest#allTableTypesTestsFactory() #2:testStreamingReadFromTable(EXTERNAL) started [info] Test io.sparkuctest.UCDeltaStreamingTest#allTableTypesTestsFactory() #2:testStreamingReadFromTable(MANAGED) started [info] Test run finished: 0 failed, 0 ignored, 4 total, 250.336s [info] Passed: Total 4, Failed 0, Errors 0, Passed 4 [success] Total time: 274 s (04:34), completed Jan 16, 2026, 12:10:14 AM ``` ## Does this PR introduce _any_ user-facing changes? No. --------- Co-authored-by: Tathagata Das <tathagata.das1565@gmail.com>
Which Delta project/connector is this regarding?
Description
Introduce changes to AddFile and GenerateIcebergCompatUtils to support Deletion Vectors. Relevant changes accessible via this link.
How was this patch tested?
Unit tests.
Does this PR introduce any user-facing changes?
No.