Skip to content

Columnar shuffle with Velox fails with UnsupportedOperationException: DirectByteBuffer not available #11716

@ijbgreen

Description

@ijbgreen

Backend

VL (Velox)

Bug description

When enabling columnar shuffle with the Velox backend using the following configuration:

spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager
spark.gluten.sql.columnar.shuffle.enabled=true

Spark should execute the shuffle phase using Gluten’s columnar shuffle implementation with the Velox backend. Queries such as loading a Parquet dataset and running simple operations like count() or aggregations should complete successfully.

Example workload:


val df = spark.read.parquet("parquet_file")
df.count()

or

df.groupBy("tipo_comprobante").count().show()

These operations are expected to run normally with Velox execution enabled.

Actual behavior

When columnar shuffle is enabled, Spark fails at runtime with an exception originating from the Velox execution pipeline. The job fails while processing the dataset and produces the following error:

org.apache.gluten.exception.GlutenException: VeloxRuntimeError
Error Code: INVALID_STATE
Reason: Operator::getOutput failed for [operator: TableScan]

The root cause reported in the stack trace is:

java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available

The stack trace indicates the failure occurs during direct buffer allocation through Netty:

io.netty.util.internal.PlatformDependent.directBuffer
org.apache.gluten.vectorized.LowCopyFileSegmentJniByteInputStream.read

If the columnar shuffle configuration is removed, the same workload executes successfully using Velox for Parquet scans and the job completes without errors.

This issue description was written with the assistance of AI.

Gluten version

Gluten-1.5, main branch

Spark version

Spark-3.5.x

Spark configurations

spark.plugins=org.apache.gluten.GlutenPlugin
spark.gluten.sql.columnar.backend=velox
spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager
spark.gluten.sql.columnar.shuffle.enabled=true
spark.memory.offHeap.enabled=true
spark.memory.offHeap.size=4g

System information

Gluten Version: 1.7.0-SNAPSHOT
Commit: 096545f
CMake Version: 3.30.4
System: Linux-6.8.0-101-generic
Arch: x86_64
CPU Name: Model name: 12th Gen Intel(R) Core(TM) i7-1255U
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 13.3.0
C Compiler: /usr/bin/cc
C Compiler Version: 13.3.0
CMake Prefix Path: /usr/local;/usr;/;/server/spark/.local/share/uv/tools/cmake/lib/python3.12/site-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

Caused by: org.apache.gluten.exception.GlutenException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Operator::getOutput failed for [operator: TableScan, plan node ID: value-stream:0]

Caused by: org.apache.gluten.exception.GlutenException:
Error during calling Java code from native code:
java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available

at io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:534)
at org.apache.gluten.vectorized.LowCopyFileSegmentJniByteInputStream.read(LowCopyFileSegmentJniByteInputStream.java:100)
at org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeNext(Native Method)
at org.apache.gluten.vectorized.ColumnarBatchOutIterator.next0(ColumnarBatchOutIterator.java:70)
at org.apache.gluten.vectorized.ColumnarBatchOutIterator.next0(ColumnarBatchOutIterator.java:28)
at org.apache.gluten.iterator.ClosableIterator.next(ClosableIterator.java:48)
at org.apache.gluten.vectorized.ColumnarBatchSerializerInstanceImpl$TaskDeserializationStream.readValue(ColumnarBatchSerializer.scala:187)
at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:188)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at org.apache.gluten.vectorized.ColumnarBatchInIterator.hasNext(ColumnarBatchInIterator.java:36)
at org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native Method)
at org.apache.gluten.vectorized.ColumnarBatchOutIterator.hasNext0(ColumnarBatchOutIterator.java:65)
at org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:36)
at org.apache.gluten.execution.VeloxColumnarToRowExec.toRowIterator(VeloxColumnarToRowExec.scala:118)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.lang.Thread.run(Thread.java:840)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions