-
Notifications
You must be signed in to change notification settings - Fork 582
Description
Backend
VL (Velox)
Bug description
When enabling columnar shuffle with the Velox backend using the following configuration:
spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager
spark.gluten.sql.columnar.shuffle.enabled=true
Spark should execute the shuffle phase using Gluten’s columnar shuffle implementation with the Velox backend. Queries such as loading a Parquet dataset and running simple operations like count() or aggregations should complete successfully.
Example workload:
val df = spark.read.parquet("parquet_file")
df.count()
or
df.groupBy("tipo_comprobante").count().show()
These operations are expected to run normally with Velox execution enabled.
Actual behavior
When columnar shuffle is enabled, Spark fails at runtime with an exception originating from the Velox execution pipeline. The job fails while processing the dataset and produces the following error:
org.apache.gluten.exception.GlutenException: VeloxRuntimeError
Error Code: INVALID_STATE
Reason: Operator::getOutput failed for [operator: TableScan]
The root cause reported in the stack trace is:
java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available
The stack trace indicates the failure occurs during direct buffer allocation through Netty:
io.netty.util.internal.PlatformDependent.directBuffer
org.apache.gluten.vectorized.LowCopyFileSegmentJniByteInputStream.read
If the columnar shuffle configuration is removed, the same workload executes successfully using Velox for Parquet scans and the job completes without errors.
This issue description was written with the assistance of AI.
Gluten version
Gluten-1.5, main branch
Spark version
Spark-3.5.x
Spark configurations
spark.plugins=org.apache.gluten.GlutenPlugin
spark.gluten.sql.columnar.backend=velox
spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager
spark.gluten.sql.columnar.shuffle.enabled=true
spark.memory.offHeap.enabled=true
spark.memory.offHeap.size=4g
System information
Gluten Version: 1.7.0-SNAPSHOT
Commit: 096545f
CMake Version: 3.30.4
System: Linux-6.8.0-101-generic
Arch: x86_64
CPU Name: Model name: 12th Gen Intel(R) Core(TM) i7-1255U
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 13.3.0
C Compiler: /usr/bin/cc
C Compiler Version: 13.3.0
CMake Prefix Path: /usr/local;/usr;/;/server/spark/.local/share/uv/tools/cmake/lib/python3.12/site-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt
Relevant logs
Caused by: org.apache.gluten.exception.GlutenException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Operator::getOutput failed for [operator: TableScan, plan node ID: value-stream:0]
Caused by: org.apache.gluten.exception.GlutenException:
Error during calling Java code from native code:
java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available
at io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:534)
at org.apache.gluten.vectorized.LowCopyFileSegmentJniByteInputStream.read(LowCopyFileSegmentJniByteInputStream.java:100)
at org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeNext(Native Method)
at org.apache.gluten.vectorized.ColumnarBatchOutIterator.next0(ColumnarBatchOutIterator.java:70)
at org.apache.gluten.vectorized.ColumnarBatchOutIterator.next0(ColumnarBatchOutIterator.java:28)
at org.apache.gluten.iterator.ClosableIterator.next(ClosableIterator.java:48)
at org.apache.gluten.vectorized.ColumnarBatchSerializerInstanceImpl$TaskDeserializationStream.readValue(ColumnarBatchSerializer.scala:187)
at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:188)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at org.apache.gluten.vectorized.ColumnarBatchInIterator.hasNext(ColumnarBatchInIterator.java:36)
at org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native Method)
at org.apache.gluten.vectorized.ColumnarBatchOutIterator.hasNext0(ColumnarBatchOutIterator.java:65)
at org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:36)
at org.apache.gluten.execution.VeloxColumnarToRowExec.toRowIterator(VeloxColumnarToRowExec.scala:118)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.lang.Thread.run(Thread.java:840)