-
Notifications
You must be signed in to change notification settings - Fork 39
Description
This PR #79 implements UPDATE using MOR. The update mode is delete-insert. The updated rows are deleted and new rows are inserted.
While the PR lance-format/lance#4715 support fragment-level update columns using UpdateMode::RewriteColumns. The updated columns are written to new data files.
Each approach has its strengths: delete-insert works well when only a few rows need to be changed like:
UPDATE table_x SET col_a = xx, col_b = xx, col_c = xx WHERE id=xxWhereas UpdateMode::RewriteColumns is better suited for bulk updates of a single column like:
UPDATE table_x SET col_a = xxSo, I plan to support two update modes in Spark. We can use a parameter spark.sql.catalog.lance.update_rewrite_columns=true/false to specify update mode. The default value is true means that using UpdateMode::RewriteColumns to do update. When the value is false, using MOR to do update.
@jackye1995 @jiaoew1991 @majin1102 What do you think about this idea ?