Skip to content

Conversation

@murali-db
Copy link
Owner

@murali-db murali-db commented Jun 25, 2025

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Introduce changes to AddFile and GenerateIcebergCompatUtils to support Deletion Vectors. Relevant changes accessible via this link.

How was this patch tested?

Unit tests.

Does this PR introduce any user-facing changes?

No.

writer compat v3

addressing first round of comments

addressing comments

adding tests

tests

header
@murali-db murali-db changed the title Introducing Deletion Vector support #5 Introducing Deletion Vector support Jun 25, 2025
Map<String, Literal> partitionValues,
boolean dataChange,
Map<String, String> tags,
DeletionVectorDescriptor deletionVectorDescriptor) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make the existing API take the deletionVectorDescription Optional. Given this is not a public API, we don't need to worry about the existing users and migration is simple anyways.

return SingleAction.createAddFileSingleAction(addFile.toRow());
}

public static Row generateIcebergCompatWriterV3AddAction(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for this API.


Optional<DeletionVectorDescriptor> deletionVectorOpt =
deletionVectorDescriptor != null ? Optional.of(deletionVectorDescriptor) : Optional.empty();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to do any validations when a DV is added? for example deletion vectors feature is enabled and active?

murali-db pushed a commit that referenced this pull request Aug 12, 2025
<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description
This PR refactors CC relevant code.
1. Fix indentation.
2. Correctly override `catalogOwnedCoordinatorBackfillBatchSize` that
extends `CatalogOwnedTestBaseSuite`.
3. Consolidate the two checkpoints helpers to
`CatalogOwnedTestBaseSuite`.
4. Refactor `CatalogOwnedEnablementSuite`.

<!--
- Describe what this PR changes.
- Describe why we need the change.
 
If this PR resolves an issue be sure to include "Resolves #XXX" to
correctly link and close the issue upon merge.
-->

## How was this patch tested?
Through existing suites.

<!--
If tests were added, say they were added here. Please make sure to test
the changes thoroughly including negative and positive cases if
possible.
If the changes were tested in any way other than unit tests, please
clarify how you tested step by step (ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future).
If the changes were not tested, please explain why.
-->

## Does this PR introduce _any_ user-facing changes?
No.

<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->
murali-db pushed a commit that referenced this pull request Nov 2, 2025
…ta-io#4731)

<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [ ] Spark
- [ ] Standalone
- [ ] Flink
- [x] Kernel
- [ ] Other (fill in here)

## Description

<!--
- Describe what this PR changes.
- Describe why we need the change.
 
If this PR resolves an issue be sure to include "Resolves #XXX" to
correctly link and close the issue upon merge.
-->
Add the IcebergCompatV3 table feature, which requires the columnMapping
and rowTracking to be enabled.
## How was this patch tested?

<!--
If tests were added, say they were added here. Please make sure to test
the changes thoroughly including negative and positive cases if
possible.
If the changes were tested in any way other than unit tests, please
clarify how you tested step by step (ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future).
If the changes were not tested, please explain why.
-->
Unit tests
## Does this PR introduce _any_ user-facing changes?

<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->
murali-db added a commit that referenced this pull request Nov 2, 2025
…mpat v3 (delta-io#4733)

<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description

<!--
- Describe what this PR changes.
- Describe why we need the change.
 
If this PR resolves an issue be sure to include "Resolves #XXX" to
correctly link and close the issue upon merge.
-->

Add IcebergWriterCompatV3 table feature and unit tests.
## How was this patch tested?

<!--
If tests were added, say they were added here. Please make sure to test
the changes thoroughly including negative and positive cases if
possible.
If the changes were tested in any way other than unit tests, please
clarify how you tested step by step (ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future).
If the changes were not tested, please explain why.
-->
Unit tests. 

## Does this PR introduce _any_ user-facing changes?

<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->
murali-db added a commit that referenced this pull request Nov 2, 2025
This PR adds comprehensive integration tests that validate the entire
server-side planning stack from DeltaCatalog through to data reading.

Test Coverage:
- Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data
- SELECT query execution through server-side planning path
- Aggregation queries (SUM, COUNT, GROUP BY)
- Verification that normal path is unaffected when feature disabled

Test Strategy:
1. Enable DeltaCatalog as Spark catalog
2. Create Parquet tables with test data
3. Enable forceServerSidePlanning flag
4. Configure ServerSidePlanningTestClientFactory
5. Execute queries and verify results
6. Verify scan plan discovered files

Test Cases:
- E2E full stack integration with SELECT query
- E2E aggregation query (SUM, COUNT, GROUP BY)
- Normal path verification (feature disabled)

Assertions:
- Query results are correct
- Files are discovered via server-side planning
- Aggregations produce correct values
- Normal table loading works when feature disabled

This completes the test pyramid:
- PR #1: Test infrastructure (REST server)
- PR #2: Client unit tests
- PR #3: DSv2 Table unit and integration tests
- PR #4: DeltaCatalog integration (no new tests - minimal change)
- PR #5: Full stack E2E integration tests (this PR)

All functionality is now fully tested from unit to integration level.
murali-db added a commit that referenced this pull request Nov 3, 2025
This PR adds comprehensive integration tests that validate the entire
server-side planning stack from DeltaCatalog through to data reading.

Test Coverage:
- Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data
- SELECT query execution through server-side planning path
- Aggregation queries (SUM, COUNT, GROUP BY)
- Verification that normal path is unaffected when feature disabled

Test Strategy:
1. Enable DeltaCatalog as Spark catalog
2. Create Parquet tables with test data
3. Enable forceServerSidePlanning flag
4. Configure ServerSidePlanningTestClientFactory
5. Execute queries and verify results
6. Verify scan plan discovered files

Test Cases:
- E2E full stack integration with SELECT query
- E2E aggregation query (SUM, COUNT, GROUP BY)
- Normal path verification (feature disabled)

Assertions:
- Query results are correct
- Files are discovered via server-side planning
- Aggregations produce correct values
- Normal table loading works when feature disabled

This completes the test pyramid:
- PR #1: Test infrastructure (REST server)
- PR #2: Client unit tests
- PR #3: DSv2 Table unit and integration tests
- PR #4: DeltaCatalog integration (no new tests - minimal change)
- PR #5: Full stack E2E integration tests (this PR)

All functionality is now fully tested from unit to integration level.
murali-db added a commit that referenced this pull request Nov 3, 2025
This PR adds comprehensive integration tests that validate the entire
server-side planning stack from DeltaCatalog through to data reading.

Test Coverage:
- Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data
- SELECT query execution through server-side planning path
- Aggregation queries (SUM, COUNT, GROUP BY)
- Verification that normal path is unaffected when feature disabled

Test Strategy:
1. Enable DeltaCatalog as Spark catalog
2. Create Parquet tables with test data
3. Enable forceServerSidePlanning flag
4. Configure ServerSidePlanningTestClientFactory
5. Execute queries and verify results
6. Verify scan plan discovered files

Test Cases:
- E2E full stack integration with SELECT query
- E2E aggregation query (SUM, COUNT, GROUP BY)
- Normal path verification (feature disabled)

Assertions:
- Query results are correct
- Files are discovered via server-side planning
- Aggregations produce correct values
- Normal table loading works when feature disabled

This completes the test pyramid:
- PR #1: Test infrastructure (REST server)
- PR #2: Client unit tests
- PR #3: DSv2 Table unit and integration tests
- PR #4: DeltaCatalog integration (no new tests - minimal change)
- PR #5: Full stack E2E integration tests (this PR)

All functionality is now fully tested from unit to integration level.
murali-db added a commit that referenced this pull request Nov 3, 2025
These files should remain as-is from the Iceberg 1.10.0 upgrade.
PR #1 should only add REST catalog test infrastructure.

Reverted changes:
- TableMetadata.java: Keep original comments and SUPPORTED_TABLE_FORMAT_VERSION=4
- MetadataUpdate.java: Keep original HACK-HACK comments

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
murali-db added a commit that referenced this pull request Nov 3, 2025
This PR adds comprehensive integration tests that validate the entire
server-side planning stack from DeltaCatalog through to data reading.

Test Coverage:
- Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data
- SELECT query execution through server-side planning path
- Aggregation queries (SUM, COUNT, GROUP BY)
- Verification that normal path is unaffected when feature disabled

Test Strategy:
1. Enable DeltaCatalog as Spark catalog
2. Create Parquet tables with test data
3. Enable forceServerSidePlanning flag
4. Configure ServerSidePlanningTestClientFactory
5. Execute queries and verify results
6. Verify scan plan discovered files

Test Cases:
- E2E full stack integration with SELECT query
- E2E aggregation query (SUM, COUNT, GROUP BY)
- Normal path verification (feature disabled)

Assertions:
- Query results are correct
- Files are discovered via server-side planning
- Aggregations produce correct values
- Normal table loading works when feature disabled

This completes the test pyramid:
- PR #1: Test infrastructure (REST server)
- PR #2: Client unit tests
- PR #3: DSv2 Table unit and integration tests
- PR #4: DeltaCatalog integration (no new tests - minimal change)
- PR #5: Full stack E2E integration tests (this PR)

All functionality is now fully tested from unit to integration level.
murali-db added a commit that referenced this pull request Nov 3, 2025
This PR adds comprehensive integration tests that validate the entire
server-side planning stack from DeltaCatalog through to data reading.

Test Coverage:
- Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data
- SELECT query execution through server-side planning path
- Aggregation queries (SUM, COUNT, GROUP BY)
- Verification that normal path is unaffected when feature disabled

Test Strategy:
1. Enable DeltaCatalog as Spark catalog
2. Create Parquet tables with test data
3. Enable forceServerSidePlanning flag
4. Configure ServerSidePlanningTestClientFactory
5. Execute queries and verify results
6. Verify scan plan discovered files

Test Cases:
- E2E full stack integration with SELECT query
- E2E aggregation query (SUM, COUNT, GROUP BY)
- Normal path verification (feature disabled)

Assertions:
- Query results are correct
- Files are discovered via server-side planning
- Aggregations produce correct values
- Normal table loading works when feature disabled

This completes the test pyramid:
- PR #1: Test infrastructure (REST server)
- PR #2: Client unit tests
- PR #3: DSv2 Table unit and integration tests
- PR #4: DeltaCatalog integration (no new tests - minimal change)
- PR #5: Full stack E2E integration tests (this PR)

All functionality is now fully tested from unit to integration level.
murali-db added a commit that referenced this pull request Nov 3, 2025
This PR adds comprehensive integration tests that validate the entire
server-side planning stack from DeltaCatalog through to data reading.

Test Coverage:
- Full stack integration: DeltaCatalog → ServerSidePlannedTable → Client → Data
- SELECT query execution through server-side planning path
- Aggregation queries (SUM, COUNT, GROUP BY)
- Verification that normal path is unaffected when feature disabled

Test Strategy:
1. Enable DeltaCatalog as Spark catalog
2. Create Parquet tables with test data
3. Enable forceServerSidePlanning flag
4. Configure ServerSidePlanningTestClientFactory
5. Execute queries and verify results
6. Verify scan plan discovered files

Test Cases:
- E2E full stack integration with SELECT query
- E2E aggregation query (SUM, COUNT, GROUP BY)
- Normal path verification (feature disabled)

Assertions:
- Query results are correct
- Files are discovered via server-side planning
- Aggregations produce correct values
- Normal table loading works when feature disabled

This completes the test pyramid:
- PR #1: Test infrastructure (REST server)
- PR #2: Client unit tests
- PR #3: DSv2 Table unit and integration tests
- PR #4: DeltaCatalog integration (no new tests - minimal change)
- PR #5: Full stack E2E integration tests (this PR)

All functionality is now fully tested from unit to integration level.
murali-db pushed a commit that referenced this pull request Jan 21, 2026
…an show the parameters. (delta-io#5817)

#### Which Delta project/connector is this regarding?
- [X] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description
Convert @ParameterizedTest to dynamic tests so that SBT can show the
parameters.
Also upgrade the JUnit related packages so that the display name can
actually show up in SBT output.
This is to address comment:
delta-io#5772 (comment)

Before this PR it shows:
```
[info] Test io.sparkuctest.UCDeltaTableDMLTest#testBasicInsertOperations(io.sparkuctest.UCDeltaTableIntegrationBaseTest$TableType):#1 started
[info] Test io.sparkuctest.UCDeltaTableDMLTest#testBasicInsertOperations(io.sparkuctest.UCDeltaTableIntegrationBaseTest$TableType):#2 started
[info] Test io.sparkuctest.UCDeltaTableDMLTest#testMergeWithDeleteAction(io.sparkuctest.UCDeltaTableIntegrationBaseTest$TableType):#1 started
[info] Test io.sparkuctest.UCDeltaTableDMLTest#testMergeWithDeleteAction(io.sparkuctest.UCDeltaTableIntegrationBaseTest$TableType):#2 started
...
[info] Test io.sparkuctest.UCDeltaTableCreationTest#testCreateTable(io.sparkuctest.UCDeltaTableIntegrationBaseTest$TableType, boolean, boolean, boolean, boolean):#1 started
[info] Test io.sparkuctest.UCDeltaTableCreationTest#testCreateTable(io.sparkuctest.UCDeltaTableIntegrationBaseTest$TableType, boolean, boolean, boolean, boolean):#2 started
```

After this PR it shows:
```
[info] Test io.sparkuctest.UCDeltaTableDMLTest#allTableTypesTestsFactory() #1:testUpdateOperations(EXTERNAL) started
[info] Test io.sparkuctest.UCDeltaTableDMLTest#allTableTypesTestsFactory() #1:testUpdateOperations(MANAGED) started
[info] Test io.sparkuctest.UCDeltaTableDMLTest#allTableTypesTestsFactory() #2:testDeleteOperations(EXTERNAL) started
[info] Test io.sparkuctest.UCDeltaTableDMLTest#allTableTypesTestsFactory() #2:testDeleteOperations(MANAGED) started
...
[info] Test io.sparkuctest.UCDeltaTableCreationTest#testCreateTable():tableType=EXTERNAL, withPartition=true, withCluster=false, withAsSelect=true, replaceTable=true started
[info] Test io.sparkuctest.UCDeltaTableCreationTest#testCreateTable():tableType=EXTERNAL, withPartition=true, withCluster=false, withAsSelect=true, replaceTable=false started
```

## How was this patch tested?

## Does this PR introduce _any_ user-facing changes?
No.

---------

Signed-off-by: Yi Li <yi.li@databricks.com>
murali-db pushed a commit that referenced this pull request Jan 24, 2026
#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [ ] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description
Added a new UC E2E streaming test suite to cover Delta structured
streaming read/write behavior across managed/external tables, including
UC commit version validation for managed tables. The changes live in
spark/unitycatalog/src/test/java/io/sparkuctest/
UCDeltaStreamingTest.java to exercise MemoryStream writes, streaming
reads in V2 strict mode, and checkpointed queries.
<!--
⚠️⚠️ **Note to reviewers** ⚠️⚠️ this branch is based on
delta-io#5651; therefore there are a
number of changes that are not directly related to the E2E test
integration for streaming reads (those changes are to enable these
streaming reads). Please just review
[spark/unitycatalog/src/test/java/io/sparkuctest/UCDeltaStreamingTest.java](https://github.com/delta-io/delta/pull/5833/changes#diff-04879565d272ba02d9c1e47707ec9bfdb1044460e957ec769d1914178554383b)
-->
## How was this patch tested?
Locally:
```
export UC_REMOTE=false
$ build/sbt "sparkUnityCatalog/testOnly io.sparkuctest.UCDeltaStreamingTest"

...
[info] Test run started (JUnit Jupiter)
[info] Test io.sparkuctest.UCDeltaStreamingTest#allTableTypesTestsFactory() #1:testStreamingWriteToManagedTable(EXTERNAL) started
[info] Test io.sparkuctest.UCDeltaStreamingTest#allTableTypesTestsFactory() #1:testStreamingWriteToManagedTable(MANAGED) started
[info] Test io.sparkuctest.UCDeltaStreamingTest#allTableTypesTestsFactory() #2:testStreamingReadFromTable(EXTERNAL) started
[info] Test io.sparkuctest.UCDeltaStreamingTest#allTableTypesTestsFactory() #2:testStreamingReadFromTable(MANAGED) started
[info] Test run finished: 0 failed, 0 ignored, 4 total, 24.65s
[info] Passed: Total 4, Failed 0, Errors 0, Passed 4
[success] Total time: 44 s, completed Jan 16, 2026, 12:17:45 AM
```

and tested against remote UC server
```
export UC_REMOTE=true
export UC_URI=$UC_URI
export UC_TOKEN=$UC_TOKEN
export UC_CATALOG_NAME=main
export UC_SCHEMA_NAME=demo_zh
export UC_BASE_TABLE_LOCATION=$S3_BASE_LOCATION

$ build/sbt "sparkUnityCatalog/testOnly io.sparkuctest.UCDeltaStreamingTest"

...
[info] Test run started (JUnit Jupiter)
[info] Test io.sparkuctest.UCDeltaStreamingTest#allTableTypesTestsFactory() #1:testStreamingWriteToManagedTable(EXTERNAL) started
[info] Test io.sparkuctest.UCDeltaStreamingTest#allTableTypesTestsFactory() #1:testStreamingWriteToManagedTable(MANAGED) started
[info] Test io.sparkuctest.UCDeltaStreamingTest#allTableTypesTestsFactory() #2:testStreamingReadFromTable(EXTERNAL) started
[info] Test io.sparkuctest.UCDeltaStreamingTest#allTableTypesTestsFactory() #2:testStreamingReadFromTable(MANAGED) started
[info] Test run finished: 0 failed, 0 ignored, 4 total, 250.336s
[info] Passed: Total 4, Failed 0, Errors 0, Passed 4
[success] Total time: 274 s (04:34), completed Jan 16, 2026, 12:10:14 AM
```

## Does this PR introduce _any_ user-facing changes?
No.

---------

Co-authored-by: Tathagata Das <tathagata.das1565@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants