Skip to content

[Spark 4.1.0] FileStreamSink and MetadataLogFileIndex moved to different packages #14112

@res-life

Description

@res-life

Summary

In Spark 4.1.0, FileStreamSink and MetadataLogFileIndex were moved to different packages.

Details

  • Spark Version: 4.1.0
  • Change Type: Package relocation

Spark ≤4.0.x

import org.apache.spark.sql.execution.streaming.FileStreamSink
import org.apache.spark.sql.execution.streaming.MetadataLogFileIndex

Spark 4.1.0+

import org.apache.spark.sql.execution.streaming.sinks.FileStreamSink
import org.apache.spark.sql.execution.streaming.runtime.MetadataLogFileIndex

Impact

Code that imports from the old packages will fail to compile:

not found: value FileStreamSink
not found: type MetadataLogFileIndex

Affected Files

  • sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuDataSourceBase.scala

Solution

Create version-specific shims FileStreamSinkShims:

For Spark 3.2.0-4.0.x (spark320/):

import org.apache.spark.sql.execution.streaming.{FileStreamSink, MetadataLogFileIndex}

object FileStreamSinkShims {
  def hasMetadata(...): Boolean = FileStreamSink.hasMetadata(...)
  def newMetadataLogFileIndex(...): MetadataLogFileIndex = new MetadataLogFileIndex(...)
}

For Spark 4.1.0+ (spark410/):

import org.apache.spark.sql.execution.streaming.sinks.FileStreamSink
import org.apache.spark.sql.execution.streaming.runtime.MetadataLogFileIndex

object FileStreamSinkShims {
  def hasMetadata(...): Boolean = FileStreamSink.hasMetadata(...)
  def newMetadataLogFileIndex(...): MetadataLogFileIndex = new MetadataLogFileIndex(...)
}

References

  • Spark 4.0.1: org.apache.spark.sql.execution.streaming
  • Spark 4.1.0: Split into streaming.sinks and streaming.runtime

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions