Skip to content

Conversation

@Nospamas
Copy link
Contributor

@Nospamas Nospamas commented Jan 14, 2026

Having chatted with Rod about this we came to the conclusion that being able to handle renaming networks while still being able to refer to them in some sort of human readable way is best served by adding a new immutable and unique column to the meta_network table. This column is network_key. This PR handles the changes required to add this column as well as resolve various issues arising from adding columns to a history enabled Pycds table. Main changes:

  • Migration that adds the table column and
  • Migration that allows us to maintain similar column orders by copying data to a new history table with the network_key column
  • Table ORM versioning that allows us to run tests against an older version of the database where the column may not exist yet

Info on the second point:

Rod's setup for history tracking uses a set of triggers that execute any time modifications are made to the base tables. These triggers are shared by all tables that the history tracking is applied to and rely on column order to insert the right information from the base table into the history. When adding a new column it is added to the end, we can add it to both tables but because history includes two extra columns: deleted and <table>_hx_id when trying to run the triggers with the new column they aren't in alignment. There are 3 main potential fixes for this:

    1. use a copy on write type process: involves renaming the existing history table, creating a new one with the new column, copying the existing data over including its FK references and deleting the old table. Pros: Self contained, still uses existing history logic. Cons: Doesn't really fix the problem long term, prone to data loss if there are logical errors in my code when copying the data.
    1. Update the triggers so that they properly use the column names instead of relying on order. Pros: fixes issue permenently. Cons: Triggers would have to be created per table, more work to re-work trigger generation functions in python
    1. Rework triggers and table column orders so that history-unique columns are at the front of the table. This would allow triggers to rely on order when adding new columns as the extra ones in history would be dealt with first. Pros: Fixes the problem permenently, keeps trigger code simple. Cons: Would require copying and regenerating all the tables in the database. This wouldn't be too bad for the metadata but would take several hours for obs_raw.

We've opted for #1 for now as it is pretty rare that it is necessary to add new columns to an existing table and is the pragmatic approach to this problem.

Copy link
Contributor

@QSparks QSparks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few minor comments.
I am not familiar with the database internals, but the migration logic looks sound.

One possible edge case: network_name is nullable in the schema. If any rows have NULL network_name, they'll get NULL network_key values (which PostgreSQL's unique constraint allows). Do we ever have NULL network names in practice? If so, should we validate that all network names are populated?

Copy link
Contributor

@jameshiebert jameshiebert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple small changes to consider. I'll let you be the judge of their practicality.

)
)

# Drop existing triggers before modifying table structure so that we don't accidentally track
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Near as I can tell, you got the sequence right for dropping/recreating all of the triggers/constraints/FKs. Not sure if this is possible, but could we disable the triggers or defer them, so we don't have to recreate them by definition? I feel like it would make the code less verbose, and give us the assurance that we're not changing the definition (unless that's what we want).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it changes verbosity too much, but I've added some functions to enable and disable this trigger rather than removing it.

I'm fairly certain the code in its current state is safe, but if someone were to change the underlying functions that created and removed these triggers there is the potential for unexpected definition change.

sa.Column(
"network_key",
sa.String(),
nullable=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To Quintin's point, network_name, though currently NULLable, shouldn't be (and in practice, never is), and I think network_key should not be nullable either. What good is a key that's NULL :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree but I think I'm I'm stuck in a catch 22 here. I can't add a new column with out it being either a) nullable or b) have a default value applied. Default values in postgres need to be pretty simple and can't refer to other column data even via functions.

In order to circumvent this the code is as currently applied: The column is created as a nullable but a trigger will populate this column any time an insert happens acting as a default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants