Skip to content

[VL] CudfValueStreamNode doensn't support splits #11569

@marin-ma

Description

@marin-ma

Backend

VL (Velox)

Bug description

Got exception when running with cudf

26/02/05 11:02:55 WARN TaskSetManager: Lost task 2.0 in stage 40.0 (TID 12231) (10.0.1.6 executor 2): org.apache.gluten.exception.GlutenException: Exception: VeloxUserError
Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: Splits can be associated only with leaf plan nodes which require splits. Plan node ID value-stream:0 doesn't refer to such plan node.
Retriable: False
Function: getPlanNodeSplitsStateLocked
File: /velox/velox/exec/Task.cpp
Line: 612
Stack trace:
# 0  _ZN8facebook5velox7process10StackTraceC1Ei
# 1  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_14VeloxUserErrorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRKNS1_18VeloxCheckFailArgsET0_
# 3  _ZN8facebook5velox4exec4Task28getPlanNodeSplitsStateLockedERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
# 4  _ZN8facebook5velox4exec4Task12noMoreSplitsERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
# 5  _ZN6gluten24WholeStageResultIterator12noMoreSplitsEv
# 6  Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeNoMoreSplits
# 7  0x00007f5bffb845da

	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeNoMoreSplits(Native Method)
	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.noMoreSplits(ColumnarBatchOutIterator.java:108)
	at org.apache.gluten.backendsapi.velox.VeloxIteratorApi.genFinalStageIterator(VeloxIteratorApi.scala:270)
	at org.apache.gluten.execution.WholeStageZippedPartitionsRDD.$anonfun$compute$1(WholeStageZippedPartitionsRDD.scala:57)
	at org.apache.gluten.utils.Arm$.withResource(Arm.scala:25)
	at org.apache.gluten.metrics.GlutenTimeMetric$.millis(GlutenTimeMetric.scala:37)
	at org.apache.gluten.execution.WholeStageZippedPartitionsRDD.compute(WholeStageZippedPartitionsRDD.scala:44)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
	at org.apache.spark.scheduler.Task.run(Task.scala:141)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)

Gluten version

main branch

Spark version

Spark-3.5.x

Spark configurations

No response

System information

No response

Relevant logs

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions