Skip to content

Support update using UpdateMode::RewriteColumns #166

@fangbo

Description

@fangbo

This PR #79 implements UPDATE using MOR. The update mode is delete-insert. The updated rows are deleted and new rows are inserted.

While the PR lance-format/lance#4715 support fragment-level update columns using UpdateMode::RewriteColumns. The updated columns are written to new data files.

Each approach has its strengths: delete-insert works well when only a few rows need to be changed like:

UPDATE table_x SET col_a = xx, col_b = xx, col_c = xx WHERE id=xx

Whereas UpdateMode::RewriteColumns is better suited for bulk updates of a single column like:

UPDATE table_x SET col_a = xx

So, I plan to support two update modes in Spark. We can use a parameter spark.sql.catalog.lance.update_rewrite_columns=true/false to specify update mode. The default value is true means that using UpdateMode::RewriteColumns to do update. When the value is false, using MOR to do update.

@jackye1995 @jiaoew1991 @majin1102 What do you think about this idea ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions