Skip to content

Conversation

@WZhuo
Copy link
Contributor

@WZhuo WZhuo commented Dec 22, 2025

This PR mainly includes the following changes:

  1. Modify NameToIdVisitor so that the visit method can set both name_to_id and id_to_name map simultaneously.
  2. Add identifier_field_ids as a member of Schema to specify identifier_fields.
  3. Add AssignFreshIdVisitor to assign entirely new field IDs to a Schema. This will be used later when creating a new table.

@WZhuo WZhuo force-pushed the id_assigner branch 4 times, most recently from 88f50bf to 1006ae4 Compare December 23, 2025 02:57
}

std::shared_ptr<StructType> AssignFreshIdVisitor::Visit(const StructType& type) const {
auto fresh_ids = type.fields() |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: there is a discussion thread for orders to generate these ids: https://lists.apache.org/thread/kwfy80s2nm6xcpohkrzznsmhqsmz32c3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has some misunderstandings about the id assigner. All the implementations it mentioned, including Java, Python, and Rust, are consistent. The actual implementation is: use pre-order traversal, and a field's ID is assigned when its parent node is visited. This ensures that the IDs of the first-level child nodes in a struct are consecutive.

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on it! Generally LGTM. I've left some minor comments.

/// \brief Create a schema.
///
/// \param fields The fields that make up the schema.
/// \param schema_id The unique identifier for this schema (default: kInitialSchemaId).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to keep std::nullopt as the default or remove the default value of schema_id. Otherwise users may accidentally use kInitialSchemaId as the schema_id.

Comment on lines +124 to +125
const std::vector<int32_t>& IdentifierFieldIds() const;
Result<std::vector<std::string>> IdentifierFieldNames() const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const std::vector<int32_t>& IdentifierFieldIds() const;
Result<std::vector<std::string>> IdentifierFieldNames() const;
/// \brief Returns field ids of identifier fields.
const std::vector<int32_t>& IdentifierFieldIds() const;
/// \brief Returns canonical field names of identifier fields.
Result<std::vector<std::string>> IdentifierFieldNames() const;

Result<std::optional<std::reference_wrapper<const SchemaField>>> FindFieldById(
int32_t field_id) const;

/// \brief Returns the full column name for the given id.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// \brief Returns the full column name for the given id.
/// \brief Returns the canonical column name for the given id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants