Skip to content

[GLUTEN-11635] Enable partial fallback if parent node supports partial fallback#11637

Merged
jinchengchenghh merged 4 commits intoapache:mainfrom
wecharyu:partial_fallback_enhence
Mar 10, 2026
Merged

[GLUTEN-11635] Enable partial fallback if parent node supports partial fallback#11637
jinchengchenghh merged 4 commits intoapache:mainfrom
wecharyu:partial_fallback_enhence

Conversation

@wecharyu
Copy link
Contributor

What changes are proposed in this pull request?

Fix #11635

How was this patch tested?

Add a new unit test.

select plus_one(col1) as col2, l_partkey from (
  select plus_one(l_orderkey) as col1, l_partkey from lineitem
)

Before this PR:

== Physical Plan ==
*(1) Project [if (isnull(col1#73L)) null else plus_one(knownnotnull(col1#73L)) AS col2#74L, l_partkey#1L]
+- *(1) Project [if (isnull(l_orderkey#0L)) null else plus_one(knownnotnull(l_orderkey#0L)) AS col1#73L, l_partkey#1L]
   +- VeloxColumnarToRow
      +- ^(1) BatchScanExecTransformer parquet file:/root/workspace/gluten-community/backends-velox/target/scala-2.13/test-classes/tpch-data-parquet/lineitem[l_orderkey#0L, l_partkey#1L] ParquetScan DataFilters: [], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/root/workspace/gluten-community/backends-velox/target/scala-2.13..., PartitionFilters: [], PushedAggregation: [], PushedFilters: [], PushedGroupBy: [], ReadSchema: struct<l_orderkey:bigint,l_partkey:bigint> RuntimeFilters: [] NativeFilters: []

After this PR:

VeloxColumnarToRow
+- ^(3) ProjectExecTransformer [_SparkPartialProject0#56L AS col2#38L, l_partkey#1L]
   +- ^(3) InputIteratorTransformer[col1#37L, l_partkey#1L, _SparkPartialProject0#56L]
      +- ColumnarPartialProject [if (isnull(col1#37L)) null else plus_one(knownnotnull(col1#37L)) AS col2#38L, l_partkey#1L] PartialProject List(if (isnull(col1#37L)) null else plus_one(knownnotnull(col1#37L)) AS _SparkPartialProject0#56L)
         +- ^(2) ProjectExecTransformer [_SparkPartialProject0#55L AS col1#37L, l_partkey#1L]
            +- ^(2) InputIteratorTransformer[l_orderkey#0L, l_partkey#1L, _SparkPartialProject0#55L]
               +- ColumnarPartialProject [if (isnull(l_orderkey#0L)) null else plus_one(knownnotnull(l_orderkey#0L)) AS col1#37L, l_partkey#1L] PartialProject List(if (isnull(l_orderkey#0L)) null else plus_one(knownnotnull(l_orderkey#0L)) AS _SparkPartialProject0#55L)
                  +- ^(1) BatchScanExecTransformer parquet file:/root/workspace/gluten-community/backends-velox/target/scala-2.13/test-classes/tpch-data-parquet/lineitem[l_orderkey#0L, l_partkey#1L] ParquetScan DataFilters: [], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/root/workspace/gluten-community/backends-velox/target/scala-2.13..., PartitionFilters: [], PushedAggregation: [], PushedFilters: [], PushedGroupBy: [], ReadSchema: struct<l_orderkey:bigint,l_partkey:bigint> RuntimeFilters: [] NativeFilters: []

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the VELOX label Feb 20, 2026
@wecharyu wecharyu force-pushed the partial_fallback_enhence branch from 747d9fc to 8b4e841 Compare February 24, 2026 03:10
@wecharyu
Copy link
Contributor Author

@jinchengchenghh @weixiuli Could you pls take a look? Thanks~

@wForget
Copy link
Member

wForget commented Feb 26, 2026

Is there always a benefit to this? It does a partial projection, but with an extra C2R/R2C conversion.

  • before: full columnar to rows -> row-based project1 -> row-based project2 -> rows to columnar
  • after: partial columnar to rows -> row-based project1 -> rows to columnar -> partial columnar to rows -> row-based project2 -> rows to columnar

@jinchengchenghh
Copy link
Contributor

Could you show the number that this PR can gain improvement?

@wecharyu wecharyu force-pushed the partial_fallback_enhence branch from 8b4e841 to 413cb80 Compare March 6, 2026 08:43
@wecharyu
Copy link
Contributor Author

wecharyu commented Mar 6, 2026

I have run a simple test that read and write table, it shows little improvements, I think we could benefit a lot more if the operations offloaded to native are more complicated. cc: @jinchengchenghh @wForget

insert overwrite table dev_spark_auxiliary.wechar_tbl2
select plus_one(col1) as col2, col3 from (
  select col1, log_type as col3
  from test_db.test_tbl
  lateral view simpleUDTF(item_id) as col1
  where grass_region='BR' and regional_date='2026-03-01'
)
image

case other => other
}
}
newPlan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newPlan -> wrapped

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

@jinchengchenghh jinchengchenghh merged commit 68dd66a into apache:main Mar 10, 2026
111 of 113 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ColumnarPartialProject not take effect when the parent is also ProjectExec

3 participants