Skip to content

[VL] Support read old ORC file without column names#8862

Closed
ccat3z wants to merge 6 commits intoapache:mainfrom
ccat3z:feat/old-orc
Closed

[VL] Support read old ORC file without column names#8862
ccat3z wants to merge 6 commits intoapache:mainfrom
ccat3z:feat/old-orc

Conversation

@ccat3z
Copy link
Contributor

@ccat3z ccat3z commented Feb 28, 2025

What changes were proposed in this pull request?

An ORC file written by an old version has no field names in the physical schema. To read it, we must map table schema to file schema using indices.

  1. Pass ScanTransformer#getDataColumns as table schema to Velox.
  2. Enable k{Parquet,Orc}UseColumnNames in Velox to match spark default behavior, which always map table schema to physical file schema using name.

This PR depends on facebookincubator/velox#12489 (old ORC files) and facebookincubator/velox#12490 (match index mapping behavior in spark).

Fixed #5638.

How was this patch tested?

Unit tests.

@github-actions
Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

github-actions bot commented Mar 3, 2025

Run Gluten Clickhouse CI on x86

@ccat3z ccat3z marked this pull request as ready for review March 3, 2025 06:37
@ccat3z
Copy link
Contributor Author

ccat3z commented Mar 3, 2025

cc @kecookier

@github-actions
Copy link

github-actions bot commented Mar 3, 2025

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

github-actions bot commented Mar 3, 2025

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

github-actions bot commented Mar 3, 2025

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

github-actions bot commented Mar 3, 2025

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

github-actions bot commented Mar 3, 2025

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

github-actions bot commented Mar 3, 2025

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

github-actions bot commented Mar 3, 2025

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

github-actions bot commented Mar 4, 2025

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

github-actions bot commented Mar 4, 2025

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale stale label Apr 19, 2025
@github-actions
Copy link

This PR was auto-closed because it has been stalled for 10 days with no activity. Please feel free to reopen if it is still valid. Thanks.

@zhouyuan
Copy link
Member

Cc: @rui-mo

@zhouyuan zhouyuan reopened this Oct 24, 2025
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions github-actions bot removed the stale stale label Oct 25, 2025
@beliefer
Copy link
Contributor

@rui-mo Could you rebase this PR ? Thanks.

@ccat3z
Copy link
Contributor Author

ccat3z commented Oct 30, 2025

@rui-mo Could you rebase this PR ? Thanks.

I found similar code while rebasing. It seems this issue might already be fixed on the main branch.

#10697

@beliefer
Copy link
Contributor

@ccat3z Thank you for the reminder. I will pick and check the function.

@beliefer
Copy link
Contributor

beliefer commented Nov 3, 2025

@ccat3z I tested #10697, but it still not working.

@taiyang-li
Copy link
Contributor

@ccat3z may I ask what's the state of currrent PR, do you plan to finish it?

@rui-mo
Copy link
Contributor

rui-mo commented Nov 3, 2025

@beliefer @taiyang-li Would you like to create an issue to track this problem? As @ccat3z noted, the pull request #10697 was aimed at supporting reading ORC files by matching indices, but there may still be a gap for the scenario you described. Let’s continue the discussion in a dedicated issue. Thanks.

@beliefer
Copy link
Contributor

beliefer commented Nov 4, 2025

@rui-mo @taiyang-li @ccat3z I already created #11010.

@ccat3z
Copy link
Contributor Author

ccat3z commented Nov 5, 2025

Fixed in #10697

@ccat3z ccat3z closed this Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[VL] Read OrcFile error when schema in Orc file and the table file don't consist

5 participants