Skip to content

[VL] Delta Lake write #10215

@zhztheplayer

Description

@zhztheplayer

To add write support for Delta Lake in Velox backend.

Work items:

  1. Add native write support for Spark 3.5 + Delta 3.3
    [VL] Delta: Add Delta Lake write unit test for Spark 3.5 + Delta 3.3 #10802
    [GLUTEN-10215][VL] Delta: Native write support for Delta 3.3.1 / Spark 3.5 #10801
  2. (TODO) Add native write support for Spark 4.0 + Delta 4.0
  3. (PR pending) Native statistics tracker to avoid C2R overhead
    [GLUTEN-10215][VL] Delta write: Native statistics tracker to eliminate C2R overhead #11419
  4. (TODO, optional) Add native write support for lower versions (Spark 3.4 + Delta 2.4)
  5. Offload DeltaOptimizedWriterExec for optimize options
    [GLUTEN-10215][VL] Delta Write: Offload DeltaOptimizedWriterExec #11461

PoC at #10216.

Gaps so far:

  1. Constraint expression needs to offload (C2R / R2C added otherwise).
  2. The overwritten Delta classes in Gluten should be avoided (DeltaParquetFileFormat for example).
  3. Copied test code from Delta, if there's a way to avoid the practice.

Sub-issues

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions