feat(experimental): Rework schema handling with replication masks #476

iambriccardo · 2025-11-27T12:31:26Z

Summary

This PR introduces replication masks, a new mechanism for handling table schemas in ETL that decouples column-level filtering from schema loading.

Motivation

The key insight is that we can load the entire table schema independently of column-level filtering in replication, then rely on Relation messages to determine which columns to actually replicate.

Changes

Replication Masks

A replication mask is a bitmask that determines which columns of a TableSchema are actively replicated at any given time. Creating a mask requires:

A set of active column names (from the Relation message)
The latest TableSchema of the table (we are assuming that the last table schema stored is synced with the incoming Relation message, thus matching by column name is sufficient)

These are combined in ReplicatedTableSchema, a wrapper type that exposes only active replicated columns on top of a stable TableSchema. This allows columns to be added or removed from a publication without breaking the pipeline (assuming the destination supports missing column data, BigQuery and Iceberg will currently fail).

Destination Schema Handling

Previously, schemas were loaded by passing the SchemaStore to the destination. This caused semantic issues, for example, truncate_table relied on assumptions about whether the schema was present or not.

The new design supplies a ReplicatedTableSchema with each event, eliminating schema loading in the destination and enforcing invariants at compile time via the type system. This also enables future support for multiple schema versions within a single batch of events, which will be critical for schema change support.

Consistent Schema Loading

To ensure schema consistency between initial table copy and DDL event triggers, we now define a Postgres function describe_table_schema that returns schema data in a consistent structure. Schema change messages are emitted in the replication stream within the same transaction that modifies the schema.

More Schema Information

With the new shared schema query, we also load ordinal positions of primary keys, that enables us to create composite primary keys in downstream destinations.

DDL Event Trigger

We also have a new DDL event trigger which will be used to dispatch schema change events (ALTER TABLE statements) in a transactionally consistent way. This is doable since Postgres runs event triggers within the transaction that triggered them and they are blocking, so when an ALTER TABLE is executed, the SQL function is executed, producing the logical replication message in same transaction as the transaction modifying the table. No statements are ALTER TABLE are run until the event trigger is executed successfully.

This will be the foundational element needed for supporting schema changes.

Future Work

Follow-up PRs will leverage the DDL message for full schema change support. For now, it's included here to validate consistency.

iambriccardo · 2025-11-28T13:52:26Z

Cargo.toml

 pg_escape = { version = "0.1.1", default-features = false }
 pin-project-lite = { version = "0.2.16", default-features = false }
-postgres-replication = { git = "https://github.com/MaterializeInc/rust-postgres", default-features = false, rev = "c4b473b478b3adfbf8667d2fbe895d8423f1290b" }
+postgres-replication = { git = "https://github.com/iambriccardo/rust-postgres", default-features = false, rev = "31acf55c7e5c2244e5bb3a36e7afa2a01bf52c38" }


Used my fork which supports Message logical replication messages.

…port

coveralls · 2025-12-01T14:43:31Z

Pull Request Test Coverage Report for Build 20229266390

Details

1045 of 1180 (88.56%) changed or added relevant lines in 31 files are covered.
28 unchanged lines in 7 files lost coverage.
Overall coverage increased (+0.2%) to 82.136%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
etl-destinations/src/iceberg/core.rs	101	102	99.02%
etl-destinations/src/bigquery/client.rs	111	113	98.23%
etl-replicator/src/core.rs	0	3	0.0%
etl/src/test_utils/materialize.rs	0	3	0.0%
etl/src/types/event.rs	1	4	25.0%
etl/src/destination/memory.rs	8	12	66.67%
etl/src/pipeline.rs	5	9	55.56%
etl/src/store/both/memory.rs	0	4	0.0%
etl/src/test_utils/test_destination_wrapper.rs	10	14	71.43%
etl/src/replication/client.rs	103	108	95.37%

Files with Coverage Reduction	New Missed Lines	%
etl-replicator/src/core.rs	1	0.0%
etl/src/test_utils/materialize.rs	1	57.14%
etl/src/test_utils/test_schema.rs	1	90.88%
etl/src/types/event.rs	1	40.0%
etl/src/replication/apply.rs	3	87.9%
etl/src/state/table.rs	4	61.11%
etl/src/workers/table_sync.rs	17	76.49%

Totals
Change from base Build 19957625863:	0.2%
Covered Lines:	16851
Relevant Lines:	20516

💛 - Coveralls

iambriccardo · 2025-12-01T16:02:50Z

etl-api/tests/support/database.rs

-    // Run replicator migrations to create the state store tables.
-    sqlx::migrate!("../etl-replicator/migrations")
+    // Run migrations to create the etl tables.
+    sqlx::migrate!("../etl/migrations")


Decided to move migrations into etl itself since now they are required for ETL to work, independently of which store implementation it's used.

iambriccardo · 2025-12-01T16:05:13Z

etl-postgres/src/types/schema.rs

+    /// The 1-based ordinal position of the column in the table.
+    pub ordinal_position: i32,
+    /// The 1-based ordinal position of this column in the primary key, or None if not a primary key.
+    pub primary_key_ordinal_position: Option<i32>,


This is used to properly create a composite primary key definition on the destination.

abhiaagarwal · 2025-12-11T12:25:19Z

Hey @iambriccardo, how stable is this? I'm willing to give this a whirl in one of my dev environments to see how it plays, since schema replication support is becoming increasingly important for my use case

iambriccardo · 2025-12-11T13:25:57Z

Hey @iambriccardo, how stable is this? I'm willing to give this a whirl in one of my dev environments to see how it plays, since schema replication support is becoming increasingly important for my use case

Hi! This is just a base PR for the system, if you see I have 2 other branches ddl-support-2 and ddl-support-3. 2 is adding the actual schema change support in the engine itself (not in the destinations, so it's for now silent), 3 is adding it to BigQuery.

If you want you can try out ddl-support-3 but it's only BigQuery and I have still to improve it a bit. I hope by next week at most to have something out.

I am overly cautious with this since handling schema changes is really tricky to get right and also make it fault tolerant.

abhiaagarwal · 2025-12-11T14:11:44Z

Hey @iambriccardo, how stable is this? I'm willing to give this a whirl in one of my dev environments to see how it plays, since schema replication support is becoming increasingly important for my use case

Hi! This is just a base PR for the system, if you see I have 2 other branches ddl-support-2 and ddl-support-3. 2 is adding the actual schema change support in the engine itself (not in the destinations, so it's for now silent), 3 is adding it to BigQuery.

If you want you can try out ddl-support-3 but it's only BigQuery and I have still to improve it a bit. I hope by next week at most to have something out.

I am overly cautious with this since handling schema changes is really tricky to get right and also make it fault tolerant.

Yep, makes sense. I'll give it a whirl, thanks! I know there's maybe 3 or 4 different approaches you've taken to trying to solve the schema problem; just wondering if this is the approach you're committing to

iambriccardo · 2025-12-11T14:16:08Z

The approach I seem to be most happy with is the usage of a custom DDL event trigger which emits a detailed schema change message consistently in the WAL. Then the system keeps track of these special messages to build new schema versions (identified by the start_lsn of the custom logical message). After each DDL change, then a Relation message is used to compute a replication_mask which represents which columns of the schema are actually replicated (for column-level filtering).

Improve

a22dc44

iambriccardo changed the title ~~Improve~~ feat(experimental): Add DDL trigger for data changes Nov 27, 2025

iambriccardo added 21 commits November 27, 2025 14:51

Improve

34a21c3

Improve

7b55c75

Improve

b0477a0

Improve

2921c5a

Improve

6f7202d

Improve

9c6eb9c

Improve

695b4e5

Improve

330f304

Improve

6859b19

Improve

c124deb

Improve

d978ac0

Improve

c4f7573

Improve

5d1bfd1

Improve

5d8806b

Improve

afd21e1

Improve

d357538

Improve

e06a009

Improve

da19127

Improve

77741ef

Improve

65768e7

Improve

9241fe9

iambriccardo commented Nov 28, 2025

View reviewed changes

iambriccardo added 6 commits November 28, 2025 15:36

Improve

aa25d22

Improve

f7f2e79

Merge remote-tracking branch 'origin/main' into riccardo/feat/ddl-sup…

639a3f0

…port

Improve

b716388

Improve

7d3b043

Improve

f201605

iambriccardo added 5 commits December 1, 2025 14:27

Improve

bc001e6

Improve

a072abf

Improve

c3346c5

Improve

ab87226

Improve

af61344

Improve

23483c0

iambriccardo commented Dec 1, 2025

View reviewed changes

iambriccardo added 2 commits December 2, 2025 16:08

Improve

486539b

Improve

f6be7b1

iambriccardo changed the title ~~feat(experimental): Add DDL trigger for data changes~~ feat(experimental): Rework schema handling Dec 2, 2025

iambriccardo added 2 commits December 2, 2025 16:56

Improve

2de1c17

Improve

a50a244

iambriccardo changed the title ~~feat(experimental): Rework schema handling~~ feat(experimental): Rework schema handling with replication masks Dec 2, 2025

Improve

09a869a

imor approved these changes Dec 5, 2025

View reviewed changes

iambriccardo added 4 commits December 5, 2025 09:48

Improve

cb06bde

Improve

dca47b7

Improve

43b49df

Improve

d4b9168

iambriccardo added 2 commits December 15, 2025 11:34

Improve event trigger

26503f3

Improve

5fd2070

pgnickb requested review from Copilot and pgnickb and removed request for Copilot December 15, 2025 19:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(experimental): Rework schema handling with replication masks #476

feat(experimental): Rework schema handling with replication masks #476

Uh oh!

iambriccardo commented Nov 27, 2025 •

edited

Loading

Uh oh!

iambriccardo Nov 28, 2025

Uh oh!

coveralls commented Dec 1, 2025 •

edited

Loading

Uh oh!

iambriccardo Dec 1, 2025

Uh oh!

iambriccardo Dec 1, 2025

Uh oh!

abhiaagarwal commented Dec 11, 2025

Uh oh!

iambriccardo commented Dec 11, 2025

Uh oh!

abhiaagarwal commented Dec 11, 2025

Uh oh!

iambriccardo commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

feat(experimental): Rework schema handling with replication masks #476

Are you sure you want to change the base?

feat(experimental): Rework schema handling with replication masks #476

Uh oh!

Conversation

iambriccardo commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes

Replication Masks

Destination Schema Handling

Consistent Schema Loading

More Schema Information

DDL Event Trigger

Future Work

Uh oh!

iambriccardo Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

coveralls commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 20229266390

Details

💛 - Coveralls

Uh oh!

iambriccardo Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

iambriccardo Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

abhiaagarwal commented Dec 11, 2025

Uh oh!

iambriccardo commented Dec 11, 2025

Uh oh!

abhiaagarwal commented Dec 11, 2025

Uh oh!

iambriccardo commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

iambriccardo commented Nov 27, 2025 •

edited

Loading

coveralls commented Dec 1, 2025 •

edited

Loading