Skip to content

Conversation

@jameszhu-shopify
Copy link

@jameszhu-shopify jameszhu-shopify commented Nov 25, 2025

Summary

This PR adds support for tables with composite primary keys (multi-column PKs) in Ghostferry's data migration process. Previously, Ghostferry only supported tables with single-column numeric primary keys. This PR is developed on top of another PR - Pagination beyond uint64

This is a large PR so please let me know if I should break it down to smaller ones.

Key Changes

Core Pagination Key Types (pagination_key.go)

  • New CompositeKey type: A slice of PaginationKey objects for multi-column keys
  • Unified MinPaginationKey/MaxPaginationKey functions that handle both single and composite keys
  • JSON serialization support for state persistence and resume

Table Schema Detection (table_schema_cache.go)

  • GetPaginationColumns() returns all PK columns (replaces single-column GetPaginationColumn())
  • Automatic detection of composite PKs during schema loading

Cursor & Data Iteration (cursor.go, data_iterator.go)

  • SQL generation for composite key WHERE clauses: (col1, col2) > (?, ?)
  • SQL generation for composite key ORDER BY: ORDER BY col1, col2
  • Row batch handling extracts composite keys from result sets

State Management (state_tracker.go)

  • Progress tracking works with composite keys (CompositeKey.NumericPosition()uses first column as heuristic).

Verifiers (iterative_verifier.go, inline_verifier.go, compression_verifier.go)

  • Updated hash queries to handle composite keys: WHERE (k1, k2) IN ((?, ?), ...)
  • Reverification store uses string-serialized composite keys

Sorter (data_iterator_sorter.go)

  • MaxPaginationKeySorter handles mixed key types without panicking

Test

dev test pass (unit tests and integration tests)

✅ New integration tests are added for composite key support (with 2 columns, 3 columns).

⚠️ This is a large change, so better to run ghost-ferry end-2-end on a real-world instance to verify the backwards-compatibility with caution.

Backward Compatibility

  • Single-column tables still work via the unified functions. State serialization is backward compatible

@jameszhu-shopify jameszhu-shopify changed the base branch from uuid-as-id to main November 25, 2025 05:54
@jameszhu-shopify jameszhu-shopify marked this pull request as ready for review November 25, 2025 17:14
@jameszhu-shopify jameszhu-shopify marked this pull request as draft November 25, 2025 17:52
}

func (t *TableSchema) paginationKeyColumn(cascadingPaginationColumnConfig *CascadingPaginationColumnConfig) (*schema.TableColumn, int, error) {
func (t *TableSchema) getPaginationKeyColumns(cascadingPaginationColumnConfig *CascadingPaginationColumnConfig) ([]*schema.TableColumn, []int, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if that's the right place, though all composite columns should be checked for *_bin collation (#422 does this for non-integer PKs in the parent PR)

"table": c.Table.String(),
"tag": "cursor",
})
c.paginationKeyColumn = c.Table.GetPaginationColumn()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's check if this field can be removed 👀

return t.PaginationKeyColumns
}

// Deprecated: Use GetPaginationKeyIndexes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the fns that got marked as deprecated?

}
} else if cascadingPaginationColumnConfig != nil {
// Fallback
if fallbackColumnName, ok := cascadingPaginationColumnConfig.FallbackPaginationColumnName(); ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in case fallback columns are specified that are not the primary key, ghostferry must fail unless there exists a unique index that spans them (in the right order)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants