Skip to content

feat: Add IcebergDataSource#16177

Closed
PingLiuPing wants to merge 1 commit intofacebookincubator:mainfrom
PingLiuPing:lp_refactor_iceberg_split
Closed

feat: Add IcebergDataSource#16177
PingLiuPing wants to merge 1 commit intofacebookincubator:mainfrom
PingLiuPing:lp_refactor_iceberg_split

Conversation

@PingLiuPing
Copy link
Collaborator

Add IcebergDataSource to support creating IcebergSplitReader instances, and remove the IcebergSplitReader creation logic from the Hive split reader.

IcebergSplitReader is the last remaining Iceberg-specific symbol that Hive depends on. By moving this logic out of Hive, Hive no longer has a dependency on Iceberg. As a result, the Hive CMake configuration can be simplified and cleaned up.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 30, 2026
@netlify
Copy link

netlify bot commented Jan 30, 2026

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 2331e0f
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/697e68ba2dcd130008943fee

@PingLiuPing PingLiuPing requested review from Yuhta and removed request for majetideepak January 30, 2026 15:35
@Yuhta Yuhta added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Jan 30, 2026
@PingLiuPing PingLiuPing force-pushed the lp_refactor_iceberg_split branch from e6ac988 to 25a402c Compare January 30, 2026 17:23
@kagamiori
Copy link
Contributor

Hi @PingLiuPing, there are some build failures in CI. Could you please take a look if they are relevant to this change? Or rebase onto the latest main if they're not. Thanks!

@PingLiuPing PingLiuPing force-pushed the lp_refactor_iceberg_split branch from 25a402c to 9df83bd Compare January 30, 2026 19:35
@PingLiuPing
Copy link
Collaborator Author

Hi @PingLiuPing, there are some build failures in CI. Could you please take a look if they are relevant to this change? Or rebase onto the latest main if they're not. Thanks!

@kagamiori Thank you, rebased and updated the code.

@PingLiuPing PingLiuPing force-pushed the lp_refactor_iceberg_split branch from 9df83bd to 2331e0f Compare January 31, 2026 20:40
@meta-codesync
Copy link

meta-codesync bot commented Feb 2, 2026

@kagamiori has imported this pull request. If you are a Meta employee, you can view this in D92066506.

@PingLiuPing
Copy link
Collaborator Author

@kagamiori Could you help check the internal tests? It looks like they’ve been running for about a day.

@meta-codesync
Copy link

meta-codesync bot commented Feb 4, 2026

@kagamiori merged this pull request in 3177849.

PingLiuPing added a commit to IBM/velox that referenced this pull request Feb 6, 2026
prestodb-ci pushed a commit to IBM/velox that referenced this pull request Feb 6, 2026
This reverts commit 3177849.

Alchemy-item: (ID = 1074) Iceberg staging hub commit 2/7 - a23a4a8
prestodb-ci pushed a commit to IBM/velox that referenced this pull request Feb 6, 2026
This reverts commit 3177849.

Alchemy-item: (ID = 1074) Iceberg staging hub commit 2/7 - a23a4a8
prestodb-ci pushed a commit to IBM/velox that referenced this pull request Feb 6, 2026
This reverts commit 3177849.

Alchemy-item: (ID = 1074) Iceberg staging hub commit 2/7 - a23a4a8
prestodb-ci pushed a commit to IBM/velox that referenced this pull request Feb 6, 2026
This reverts commit 3177849.

Alchemy-item: (ID = 1074) Iceberg staging hub commit 2/7 - a23a4a8
prestodb-ci pushed a commit to IBM/velox that referenced this pull request Feb 6, 2026
This reverts commit 3177849.

Alchemy-item: (ID = 1074) Iceberg staging hub commit 2/7 - a23a4a8
prestodb-ci pushed a commit to IBM/velox that referenced this pull request Feb 7, 2026
This reverts commit 3177849.

Alchemy-item: (ID = 1074) Iceberg staging hub commit 2/7 - a23a4a8
prestodb-ci pushed a commit to IBM/velox that referenced this pull request Feb 7, 2026
This reverts commit 3177849.

Alchemy-item: (ID = 1074) Iceberg staging hub commit 2/7 - a23a4a8
prestodb-ci pushed a commit to IBM/velox that referenced this pull request Feb 7, 2026
This reverts commit 3177849.

Alchemy-item: (ID = 1074) Iceberg staging hub commit 2/7 - a23a4a8
prestodb-ci pushed a commit to IBM/velox that referenced this pull request Feb 7, 2026
This reverts commit 3177849.

Alchemy-item: (ID = 1074) Iceberg staging hub commit 2/7 - a23a4a8
prestodb-ci pushed a commit to IBM/velox that referenced this pull request Feb 7, 2026
This reverts commit 3177849.

Alchemy-item: (ID = 1074) Iceberg staging hub commit 2/7 - a23a4a8
prestodb-ci pushed a commit to IBM/velox that referenced this pull request Feb 8, 2026
This reverts commit 3177849.

Alchemy-item: (ID = 1074) Iceberg staging hub commit 2/7 - a23a4a8
prestodb-ci pushed a commit to IBM/velox that referenced this pull request Feb 8, 2026
This reverts commit 3177849.

Alchemy-item: (ID = 1074) Iceberg staging hub commit 2/7 - a23a4a8
prestodb-ci pushed a commit to IBM/velox that referenced this pull request Feb 9, 2026
This reverts commit 3177849.

Alchemy-item: (ID = 1074) Iceberg staging hub commit 2/7 - a23a4a8
wangyum pushed a commit to wangyum/velox that referenced this pull request Feb 9, 2026
This reverts commit 3177849.

Alchemy-item: (ID = 1074) Iceberg staging hub commit 2/7 - a23a4a8
(cherry picked from commit 829890d)
arup-chauhan pushed a commit to arup-chauhan/velox that referenced this pull request Feb 23, 2026
Summary:
Add IcebergDataSource to support creating IcebergSplitReader instances, and remove the IcebergSplitReader creation logic from the Hive split reader.

IcebergSplitReader is the last remaining Iceberg-specific symbol that Hive depends on. By moving this logic out of Hive, Hive no longer has a dependency on Iceberg. As a result, the Hive CMake configuration can be simplified and cleaned up.

Pull Request resolved: facebookincubator#16177

Reviewed By: kKPulla

Differential Revision: D92066506

Pulled By: kagamiori

fbshipit-source-id: 15c4b46d36882eca366cacdb30659491b92ade59
jinchengchenghh added a commit to jinchengchenghh/velox that referenced this pull request Feb 23, 2026
@jinchengchenghh
Copy link
Collaborator

jinchengchenghh commented Feb 23, 2026

This PR breaks Iceberg position delete read, after this PR, the tests failed in Gluten, VeloxIcebergSuite failed by

- iceberg read mor table - merge into *** FAILED ***
  Results do not match for query:
  Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
  Timezone Env: 
  
  == Parsed Logical Plan ==
  'Project [*]
  +- 'UnresolvedRelation [iceberg_mor_tb], [], false
  
  == Analyzed Logical Plan ==
  id: int, name: string, p: string
  Project [id#213981, name#213982, p#213983]
  +- SubqueryAlias spark_catalog.default.iceberg_mor_tb
     +- RelationV2[id#213981, name#213982, p#213983] spark_catalog.default.iceberg_mor_tb spark_catalog.default.iceberg_mor_tb
  
  == Optimized Logical Plan ==
  RelationV2[id#213981, name#213982, p#213983] spark_catalog.default.iceberg_mor_tb
  
  == Physical Plan ==
  VeloxColumnarToRow
  +- ^(1) IcebergScanTransformer spark_catalog.default.iceberg_mor_tb[id#213981, name#213982, p#213983] spark_catalog.default.iceberg_mor_tb (branch=null) [filters=, groupedBy=] RuntimeFilters: [] NativeFilters: []
  
  == Results ==
  
  == Results ==
  !== Correct Answer - 6 ==   == Gluten Answer - 9 ==
   struct<>                   struct<>
  ![1,a1_1,p2]                [1,a1,p1]
  ![2,a2_1,p2]                [1,a1_1,p2]
  ![3,a3_1,p1]                [2,a2,p1]
  ![4,a4,p2]                  [2,a2_1,p2]
  ![5,a5,p1]                  [3,a3,p2]
  ![6,a6,p2]                  [3,a3_1,p1]
  !                           [4,a4,p2]
  !                           [5,a5,p1]
  !                           [6,a6,p2] (GlutenQueryTest.scala:476)

https://github.com/apache/incubator-gluten/actions/runs/21911829182/job/63418832448?pr=11587
This PR verifies if we revert the PR, we can pass the unit test, apache/gluten#11641, the corresponding Velox branch is https://github.com/jinchengchenghh/velox/commits/dft-2026_02_17

Could you help find why the test failed? Thanks! Is it because we should update Gluten code to use IcebergConnectorFactory? @PingLiuPing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants