Skip to content

Conversation

@lishuxu
Copy link
Contributor

@lishuxu lishuxu commented Nov 15, 2025

feat: transactional UpdateProperties method support

Table/Catalog produce a Transaction (e.g., NewTransaction or staged create/replace).
Clients build PendingUpdate actions (e.g., UpdateProperties, AppendFiles), each validating via Apply() (conflict checks, reserved keys, format-version parsing, schema/metrics validation).
Transaction::CommitTransaction() is expected to gather pending updates and call the catalog’s UpdateTable (with derived TableUpdate objects and requirements) in one atomic commit; on success, metadata refreshes; on failure, errors propagate

Copy link
Collaborator

@zhjwpku zhjwpku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is consistent with the rest catalog open api, so +1, thanks.

/// \param schema_ids the schema ids to remove
/// \return a new RemoveSchemas
virtual std::shared_ptr<RemoveSchemas> RemoveSchemas(
const std::vector<int32_t>& schema_ids) = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const std::vector<int32_t>& schema_ids) = 0;
std::span<const int32_t> schema_ids) = 0;

Should we use span to accept broader range of input types for here and below?

@lishuxu lishuxu force-pushed the feature/transaction-api branch from 23bf661 to 0c1f156 Compare November 29, 2025 13:50
@lishuxu lishuxu changed the title feat: add table-update api to transaction interface feat: transactional UpdateProperties method support Nov 29, 2025
@wgtmac
Copy link
Member

wgtmac commented Dec 3, 2025

Could you rebase to resolve conflict?

@lishuxu lishuxu force-pushed the feature/transaction-api branch from aa4cadb to 14bd4f8 Compare December 6, 2025 13:38
@lishuxu
Copy link
Contributor Author

lishuxu commented Dec 8, 2025

Could you rebase to resolve conflict?

done

/// \brief Set whether the last operation in a transaction has been committed
///
/// \param committed true if the last operation has been committed, false otherwise
virtual void SetLastOperationCommitted(bool committed) = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is only used by transaction catalog so it should not appear here.


std::unique_ptr<Transaction> Table::NewTransaction() const {
throw NotImplemented("Table::NewTransaction is not implemented");
return std::make_unique<BaseTransaction>(shared_from_this(), catalog_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend directly pass Table* to it. Similarly, for catalog_ we can use catalog_.get() to just use pointer. The created transaction object should not outlive the table and catalog objects. It is unnecessary to increase the reference counter of them.

I'm open to keep the current design. WDYT @HuaHuaY?

Copy link
Contributor

@HuaHuaY HuaHuaY Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the table‘s content changes, or we drop a table, what should the executing transaction hold?

Comment on lines 41 to 43
std::unique_ptr<::iceberg::UpdateProperties> UpdateProperties() override;

std::unique_ptr<AppendFiles> NewAppend() override;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::unique_ptr<::iceberg::UpdateProperties> UpdateProperties() override;
std::unique_ptr<AppendFiles> NewAppend() override;
std::shared_ptr<UpdateProperties> NewUpdateProperties() override;
std::shared_ptr<AppendFiles> NewAppend() override;

It seems that returning shared_ptr will make it easier to collect and transform the update internally. Adding New prefix also can avoid iceberg:: prefix in the return type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion.

In the current transaction implementation, we don’t need to collect or retain PendingUpdate instances across the transaction lifecycle, so returning std::unique_ptr keeps the ownership semantics explicit and avoids introducing shared ownership where it isn’t required.

I agree that returning shared_ptr could make internal aggregation or transformation of updates easier if the transaction logic evolves in that direction.

auto update = CheckAndCreateUpdate<::iceberg::UpdateProperties>(
table_->name(), catalog_, CurrentMetadata());
if (!update.has_value()) {
ERROR_TO_EXCEPTION(update.error());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot throw. Perhaps we should also use Result wrapper for these update return types.

* For mutating calls such as UpdateTable or HasLastOperationCommitted, it delegates back
* to the owning BaseTransaction so staged updates remain private until commit.
*/
class ICEBERG_EXPORT TransactionCatalog : public Catalog {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this class to base_transaction.cc?

context_.pending_updates.emplace_back(std::move(update));
}

return std::make_unique<Table>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to return a table?

@lishuxu lishuxu force-pushed the feature/transaction-api branch from c3784d3 to 56802ce Compare December 13, 2025 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants