Skip to content

[Spark 4.1.0] ParquetColumnVector constructor signature changed - memoryMode removed #14113

@res-life

Description

@res-life

Summary

In Spark 4.1.0, the ParquetColumnVector constructor signature changed - the memoryMode parameter was removed.

Details

  • Spark Version: 4.1.0
  • Change Type: Constructor signature change

Spark 3.5.0-4.0.x (7 arguments)

new ParquetColumnVector(
  column: ParquetColumn,
  vector: WritableColumnVector,
  capacity: Int,
  memoryMode: MemoryMode,      // <-- This parameter
  missingColumns: java.util.Set[ParquetColumn],
  isTopLevel: Boolean,
  defaultValue: Any
)

Spark 4.1.0+ (6 arguments)

new ParquetColumnVector(
  column: ParquetColumn,
  vector: WritableColumnVector,
  capacity: Int,
  // memoryMode removed
  missingColumns: java.util.Set[ParquetColumn],
  isTopLevel: Boolean,
  defaultValue: Any
)

Impact

Code that creates ParquetColumnVector with 7 arguments will fail to compile:

too many arguments (found 7, expected 6) for constructor ParquetColumnVector

Affected Files

  • sql-plugin/src/main/spark350/scala/org/apache/spark/sql/execution/datasources/parquet/rapids/shims/ParquetCVShims.scala

Solution

Create version-specific ParquetCVShims:

For Spark 3.5.0-4.0.x (spark350/):

def newParquetCV(..., memoryMode: MemoryMode, ...): ParquetColumnVector = {
  new ParquetColumnVector(column, vector, capacity, memoryMode, missingColumns, isTopLevel, defaultValue)
}

For Spark 4.1.0+ (spark410/):

def newParquetCV(..., missingColumns, ...): ParquetColumnVector = {
  // No memoryMode parameter
  new ParquetColumnVector(column, vector, capacity, missingColumns, isTopLevel, defaultValue)
}

References

  • Spark 4.0.1: 7-arg constructor with memoryMode
  • Spark 4.1.0: 6-arg constructor, memoryMode removed

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions