Postgres Migration by dancoates · Pull Request #1190 · populationgenomics/metamist

dancoates · 2026-02-02T22:21:02Z

No description provided.

codecov · 2026-02-03T22:44:41Z

Codecov Report

❌ Patch coverage is 74.70912% with 413 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.40%. Comparing base (782cab8) to head (9932472).
⚠️ Report is 5 commits behind head on dev.

Files with missing lines	Patch %	Lines
api/graphql/schema.py	48.96%	74 Missing ⚠️
api/graphql/loaders.py	61.25%	62 Missing ⚠️
db/python/tables/sample.py	78.29%	28 Missing ⚠️
db/python/tables/analysis.py	64.47%	27 Missing ⚠️
db/python/tables/participant.py	85.90%	21 Missing ⚠️
api/graphql/mutations/project.py	35.48%	20 Missing ⚠️
api/graphql/mutations/sample.py	20.83%	19 Missing ⚠️
api/graphql/mutations/cohort.py	56.75%	16 Missing ⚠️
api/graphql/mutations/analysis.py	6.25%	15 Missing ⚠️
db/python/tables/sequencing_group.py	87.87%	12 Missing ⚠️
... and 25 more

Additional details and impacted files

@@            Coverage Diff             @@
##              dev    #1190      +/-   ##
==========================================
- Coverage   83.20%   78.40%   -4.80%     
==========================================
  Files         216      167      -49     
  Lines       18740    14402    -4338     
==========================================
- Hits        15592    11292    -4300     
+ Misses       3148     3110      -38

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2026-02-06T00:27:04Z

PR Checklist ✅

Have you checked the following?

Changes do not contain the names of any sensitive CPG projects
Tests cover any newly added code
There aren't any unnecessary autoformatting changes
Pull request description is appropriate for use as a commit message for squash commit

Check all boxes before merging.

…abase (#1183) As the first step of switching to postgres we need a working postgres database with the correct schema. This change removes the old liquibase changelog and replaces it with a folder of dbmate migrations. The migrations are split into a few steps: 1. The base tables 2. Addition of sys_period columns to allow for temporal tables 3. Addition of history tables based on main tables 4. Addition of triggers to automatically write changes to history tables 5. Addition of all indexes and foreign key constraints - this is done as a separate step to the table creation to make it easier to bulk import the initial data without needing to do it in the correct order. This postgres schema fixes a few longstanding issues with the mariadb schema: - removed _external_id_unused columns from family, participant, sample tables - Added missing audit log id columns to analysis_outputs and output_file tables - Fixed type of output file meta column to be json rather than varchar - renamed nullIfInactive to null_if_archived in sequencing_group_external_id table, to be consistent with other column names and the `archived` column in sequencing_group table - added missing primary key to sequencing_group_external_id table - removed full text index on comment table, it wasn't being used The Dockerfile in the `/db` directory is based on postgres 18, but installs dbmate and the postgres temporal tables extension in addition. The docker-compose file only has a single service at the moment but allows easy management of the necessary volumes. To run database migrations, run: ```bash docker compose exec postgres dbmate up ``` and to run the database, run `docker compose up` The `/db/schema.sql` file is automatically generated by dbmate, and can be used to keep track of the effect of new migrations on the overall schema

This updates the existing `connect.py` to add an implementation of postgres connections using psycopg and an async connection pool. The choice to use psycogp is largely due to its postgres specific features like supporting row_cursors, supporting COPY and better parameterization of queries. The `databases` library that we have been using for mariadb is also now archived so it is a good time to move to something else. This also does the migration of the necessary queries to get a list of the user's projects as an example of the postgres connection in use. The rest of the API is broken.

This still uses testcontainers but takes a different approach to the previous test setup. The tests have been very slow for a while due to heavy set up and tear down requirements for each test. This update utilizes the ability of postgres to very quickly create a database based on another template database. This allows us to run the database migrations once, and then clone the template database for each test. In addition this disables a bunch of durability settings for postgres and uses a ram disk for the data storage which should help to make the database creation process as quick as possible. This also commences a move to a more idiomatic pytest test configuration using fixtures for the database setup. This allows tests to easily request what resources they need, including what permissions they want the test user to have. See the test README.md for more information and examples of tests. As most tests haven't been ported yet, to stop pytest from trying to run the broken tests, there is a include list of working tests specified in the `pytest.toml` file. We can extend this as we port tests across to use this new setup, and eventually remove the include list once all tests are ported.

We use mariadb's JSON_MERGE_PATCH function in several places to merge entity meta fields. Postgres doesn't have a directly comparable function that we can use. We could switch to using postgres' json concat || operator which does merge jsonb documents, but doesn't have the more advanced functionality that JSON_MERGE_PATCH offers that allows updating nested objects. While we could probably work around not having the ability to update nested objects, the main feature of JSON_MERGE_PATCH that we use frequently is the ability to delete top level keys by passing in a null value as the patch. Postgres' json concat doesn't have a way to do this. So to avoid breaking changes it is best to implement a spec compliant version of JSON_MERGE_PATCH that we can use. The implementation is in the migration, and there are tests of all the examples from the RFC, as well as some extra edge cases. There is a significant performance loss in going down this route, the new json_merge_patch function is about 10 times slower than the built in json concat with `||`. We only use this method on entity updates and never do very large batch updates so this shouldn't be too much of a problem. __Benchmark__ ```sql CREATE TEMP TABLE benchmark_json_data AS SELECT id, -- Target: nested JSON with various depths jsonb_build_object( 'id', id, 'name', 'item_' || id, 'count', id * 10, 'active', (id % 2 = 0), 'metadata', jsonb_build_object( 'created', now()::text, 'version', 1, 'tags', jsonb_build_array('tag1', 'tag2', 'tag3'), 'nested', jsonb_build_object( 'level2', jsonb_build_object( 'level3', jsonb_build_object( 'deep_value', id * 100 ) ) ) ), 'settings', jsonb_build_object( 'option_a', true, 'option_b', false, 'option_c', 'value_' || id ) ) as target, -- Patch: updates some values, adds new ones, removes others (null) jsonb_build_object( 'name', 'updated_item_' || id, 'new_field', 'new_value_' || id, 'metadata', jsonb_build_object( 'version', 2, 'updated', now()::text, 'nested', jsonb_build_object( 'level2', jsonb_build_object( 'level3', jsonb_build_object( 'new_deep', 'added' ) ) ) ), 'settings', jsonb_build_object( 'option_a', false, 'option_d', 'new_option' ) ) as patch FROM generate_series(1, 100000) as id; ``` ```sql SELECT id, target || patch as merged FROM benchmark_json_data; -- ~350ms ``` ```sql SELECT id, json_merge_patch(target, patch) as merged FROM benchmark_json_data; --- ~ 3000ms ```

Notable changes: - The project deletion query is updated to run as multiple queries as psycopg doesn't support multiple queries in the same request - merging project meta is now done using the custom json_merge_patch function introduced in the previous commit, to maintain functional compatibility with the mariadb implementation - Tests no longer need to do manual manipulation of groups, they can use the test fixtures introduced recently

This will allow us to use template strings to build sql queries. This also updates all other python dependencies to improve compatibility with python 3.14, this was necessary for some dependencies, particularly those that are written in rust and use PyO3 for bindings, as older versions of PyO3 didn't support 3.14 and attempting to build failed with the error: ``` the configured Python interpreter version (3.14) is newer than PyO3's maximum supported version (3.13) ``` Other packages are updated as we'll need to do comprehensive testing of this anyway so now seemed like as good a time as any. After updating to 3.14 I immediately noticed that the tests were taking about 10x as long as previously. Investigation led down a bit of a rabbit hole but ended up at the credential helpers specified in ~/.docker/config.json for gcloud container authentication. For some reason during tests these credential helpers were being invoked many times and they are quite slow so were greatly impacting test times. This slowdown wasn't present when we were using 3.11, I think because the credential helpers used the active python version rather than their own one. Something about spinning up a process with a different python version was very slow. To work around this, there is a new fixture introduced in the tests that creates an empty docker config and sets that as the config to use. The test doesn't require the credential helpers as the container it uses is not in artifact registry or gcr. This problem wouldn't have existed in CI but this fixture won't break anything there either This also updates some settings and CI that were pointing to 3.11

This adds a second ruff config file for the python package which specifies 3.10 compatibility. This allows the rest of code to use 3.14 features

Storing the active postgres connection on the `Connection` object simplifies instances where a transaction is initiated in the `layer/` function but executed in the `table/` function. Prior to these changes, the connection pool was stored on `Connection`, which resulted in ambiguity in the `table/` about which connection to use. --------- Co-authored-by: Dan Coates <dan.coates@populationgenomics.org.au>

This also rewrites the tests to actually run the filter queries against some data in the database. Some bits of the query building have been removed to reduce complexity, things like checking if a query would logically result in an overall falsey where statement isn't necessary as the query planner is smart enough to optimize for that. The implementation of filtering by querying json blobs in the db is left out of this as the correct approach is yet to be established

…te strings (#1205)

Adds a migration to add default values to the following tables: ``` assay_type sample_type sequencing_type sequencing_technology sequencing_platform analysis_type ```

This takes the existing assay table methods and replaces them with postgres compatible ones. Also updates the tests to the new pytest structure. notable changes: - Removes the `open_transaction` param from assay methods, this isn't needed as we can check if we're in a transaction and conditionally avoid creating a nested transaction (nested transactions are supported in postgres with savepoints but there's a performance impact and we don't need them so best to avoid) - Some dead/unused code has been removed not implemented: - Some tests that relate to sequencing groups are skipped - Some bits of tests that relate to querying meta are commented out - There's some temporary fixtures added to create the necessary samples for assays to be attached to, these should be replaced with ones that use python methods once samples/sgs are migrated

This helper on the connection class allows us to conditionally enter a transaction if not already in one. Psycopg supports nested transactions with postgres savepoints but partial rollbacks is not a feature that we actually need - so in most cases it is fine to just have a single parent transaction. There's a bit of code involved in checking the current transaction status so this wraps that in a method on the connection model to save having to write it each time.

This PR migrates `family` and `family_participant` entities to **PostgreSQL**. It also updates related test classes that the `Family` layer is tested. --------- Co-authored-by: Dan Coates <dan.coates@populationgenomics.org.au>

ignore PLR2004 in tests as we often need to assert that the length of a list equals a certain number, in these cases there's no point defining a constant to use. Also fixes a few other misc errors

this'll enable us to enforce that other linting checks pass while working on the migration

This probably isn't as performant as it could be as it is converting the jsonb columns to text, only to then parse them again in duckdb. We could probably get them as dicts and then create an arrow table from that, but would need to manually define the schema for the table which would be a bit tricky. There's a rough plan to change how these meta tables work in the future so don't want to do much of a refactor for now as long as performance isn't too bad when we test in production. All tests are skipped for now, but should work in theory. I've manually tested the endpoints and they seem to work fine.

Mirgrate the participant and participant phenotype queries to use postgres connection In the process of this migration an issue was uncoverd in the test `test_graphql_upsert_participant_with_phenotypes`. The project passed through graphql as a string was not being used in the upsert. The appropriate data access checks were made, but then the value was not being utilised in the upsert_participants method itself. This PR includes a fix where in `upsert_participants` in `api/graphql/mutations/participant.py` the project id is fetched when the project access is checked and that value is passed down all the way through to the table where `create_participant` is called. All the tests should be passing excluding `test_query_by_sample` since the Sample tables have yet to be migrated. --------- Co-authored-by: Dan Coates <dan.coates@populationgenomics.org.au> Co-authored-by: Piyumi Rameshka <piyumi.amarasinghebaragamage@populationgenomics.org.au>

This PR migrate Audit log table-related SQL queries in the `audit_log.py` and `base.py` to PostgreSQL. --------- Co-authored-by: Dan Coates <dan.coates@populationgenomics.org.au>

Pretty straightforward migration. Most tests skipped at the moment but should in theory work when re-enabled as more things are migrated.

Migrate the sample queries to postgres Also adds in a fix to the `to_sql` function in the generic filter to return `None` if the filters are empty. --------- Co-authored-by: Dan Coates <dan.coates@populationgenomics.org.au> Co-authored-by: Piyumi Rameshka <piyumi.amarasinghebaragamage@populationgenomics.org.au>

These postgres migration queries were done prior to the update to Python 3.14. This uplifts them to use the Template string feature we are now relying on in 3.14.

This PR migrates the Enum table queries to PostgreSQL and adds a new test class to verify those queries.

This migrates sequencing group table methods to be compatible with postgres, as well as converts the corresponding test code to use pytest. Additionally, unused sequencing group table methods have been removed. Not implemented: - Table functions that are not covered by sequencing group tests have not had tests written. This includes the following: - `get_sequencing_groups_by_analysis_ids` - `get_samples_create_date_from_sgs` - `get_all_sequencing_group_ids_by_sample_ids_by_type` - `get_type_numbers_for_project` - `recreate_sequencing_group_with_new_assays` - Sequencing group tests functions that relate to Analyses have been skipped - Sequencing group tests that rely on the use of samples across multiple projects have been skipped - This is only until migration of samples is complete

Adds the ability to set the name of the test project that the `connection_with_project` fixture uses. This can be done using the `@pytest.mark.project_name(<name>)` decorator.

Adds a check for None values to the `SequencingGroupAssayFilter` when querying graphql for sequencing groups.

This PR migrate analysis_runner queries.

This PR migrates the `cohort` table to PostgreSQL. Along with the query changes, `test_cohort.py` and `test_cohort_status.py` files are also updated. Additionally, `test_mutations.py` also covers parts of the cohort layer. Since this test class depends on several other entities, it would be more appropriate to update it once those related migrations are completed. --------- Co-authored-by: Dan Coates <dan.coates@populationgenomics.org.au>

Migrate the analysis queries to use postgres, and migrate all the tests to pytest. Discovered and fixed two minor bugs in `participant.py` and `sequencing_group.py` also. --------- Co-authored-by: Dan Coates <dan.coates@populationgenomics.org.au> Co-authored-by: Piyumi Rameshka <piyumi.amarasinghebaragamage@populationgenomics.org.au>

This PR migrate project insights queries to use PostgreSQL syntax. There aren't many test cases for this work. But the existing two testcases invoke almost all the DB functions. In addition, all the new queries were manually tested.

Migrates the web questies to postgres and migrates the web tests to pytest. Additionally identified and fixed some small query bugs in `tables/assay.py`, `tables/participant.py` and `tables/sample.py`.

This PR unskips the test cases in: - test/test_external_id.py - test/test_graphql.py - test/test_pedigree.py - test/test_search.py - test/test_update_participant_family.py The test cases in these files were skipped until all the dependent entity queries are migrated to PostgreSQL syntax. While unskipping them, some test cases were failing and required some minor fixes. **There are still some test cases skipped until _querying JSON keys_ and _case insensitive string matching_ work.** In addition, the test cases in test/test_cohort_status.py file are updated to reuse the `connection/db` created in the test class `set_up` fixture instead of creating a new test DB both in the set_up and again in the testcase.

Create a filter for json columns in the new land of postgres There are some notes and a new function called `infer_type_from_filter`. I'm open to alternatives for inferring the generic `T` type but for right now we pull it from the type of values in the filters. There was also a [discussion on slack](https://centrepopgen.slack.com/archives/C0ADQ18TAES/p1771550184864699) around the behaviour of exclusion filters `neq` and `nin` around whether they should include rows that either do not have the meta key it's filtering on or even an empty meta. Originally, ALL rows with the key missing would be excluded if you filter on the meta columns. This PR modifies that behaviour and will include those rows on any `neq` or `nin` filter unless the `isnull=False` filter is also specified. The tests have been modified accordingly to test these new behaviours and the `neq` operation has been changed from `!=` to `IS DISTINCT FROM` to enable null values to pass through. In order to test the functionality as we currently use it, the tests that cover the new logic are added to `test_sequencing_groups.py` since this filtering is predominantly used to filter sgs based off of assay meta data. --------- Co-authored-by: hrrsheen <harrison.sheen@populationgenomics.org.au> Co-authored-by: Dan Coates <dan.coates@populationgenomics.org.au>

Reorders the migration files so that rolling back the default data migration doesn't violate foreign key constraints.

This PR migrate `comment` query-related test cases.

Fixes `sequencing_group` tests that were previously skipped due to missing functionality. Also replaces fixture that relied on manually inserting into the database.

Migrates the analysis output file queries to be compatible with postgres. Migrates the corresponding tests to pytest. --------- Co-authored-by: Yash <nevoodoo@users.noreply.github.com>

As described in the ticket for SET-1003, Mariadb was automatically doing very flexible, case ignoring string comparisons. This update to the GenericFilter makes all string comparisons `ILIKE` or `CASEFOLD(value)` to replicate that pre-exiting functionality after the move to Postgres. The tests have been updated accordingly.

…r call (#1238) The get_connection() function returned by dependable_get_connection_getter is called repeatedly for related graphql resolvers. Caching the project maps in dependable_get_connection_getter avoids repeated calls to get the same information from refresh_projects() and speeds up this hot path. But don't cache them for other refresh_projects() call sites for which it can be important to get a fresh list of projects.

Migrate billing-related test cases from unittest to pytest. Remove `unittest.mock.MagicMock` usages and introduce dummy mock classes (in `test/testbqbase.py`) and pytest `monkeypatch`.

dancoates force-pushed the postgres-migration branch from 9703f4a to a1b1e38 Compare February 6, 2026 00:26

dancoates added 5 commits February 6, 2026 14:37

dancoates force-pushed the postgres-migration branch from a1b1e38 to 6493a47 Compare February 6, 2026 03:37

dancoates and others added 10 commits February 6, 2026 15:17

more ruff fixes for compatibility with 3.14 (#1195)

158a272

This adds a second ruff config file for the python package which specifies 3.10 compatibility. This allows the rest of code to use 3.14 features

Update generic filter logic to handle column name overrides in templa…

9cf6be4

…te strings (#1205)

[PG] Add migration for default data (#1206)

786e89a

Adds a migration to add default values to the following tables: ``` assay_type sample_type sequencing_type sequencing_technology sequencing_platform analysis_type ```

Linting fixes

f416e75

ignore PLR2004 in tests as we often need to assert that the length of a list equals a certain number, in these cases there's no point defining a constant to use. Also fixes a few other misc errors

dancoates force-pushed the postgres-migration branch from c61e31d to f416e75 Compare February 18, 2026 00:18

dancoates and others added 10 commits February 18, 2026 11:30

Temporarily disable mypy type checking (#1217)

0f61282

this'll enable us to enforce that other linting checks pass while working on the migration

[SET-898] Migrate Audit log queries (#1212)

9a873ed

This PR migrate Audit log table-related SQL queries in the `audit_log.py` and `base.py` to PostgreSQL. --------- Co-authored-by: Dan Coates <dan.coates@populationgenomics.org.au>

SET-900 Migrate comment queries to postgres (#1216)

bfb5b06

Pretty straightforward migration. Most tests skipped at the moment but should in theory work when re-enabled as more things are migrated.

(SET-986) Migrate project queries to template strings (#1220)

b34b4ec

These postgres migration queries were done prior to the update to Python 3.14. This uplifts them to use the Template string feature we are now relying on in 3.14.

[SET-988] Migrate Enum table queries (#1221)

ff55a9c

This PR migrates the Enum table queries to PostgreSQL and adds a new test class to verify those queries.

Add test project name markers. (#1225)

e3eefe9

Adds the ability to set the name of the test project that the `connection_with_project` fixture uses. This can be done using the `@pytest.mark.project_name(<name>)` decorator.

hrrsheen and others added 16 commits February 24, 2026 15:25

Fix sg assay queries (#1226)

bf0c85e

Adds a check for None values to the `SequencingGroupAssayFilter` when querying graphql for sequencing groups.

Task/[SET-987] Migrate Analysis Runner queries (#1219)

d813ca2

This PR migrate analysis_runner queries.

[SET-989] Migrate project insights queries (#1224)

addbc6a

This PR migrate project insights queries to use PostgreSQL syntax. There aren't many test cases for this work. But the existing two testcases invoke almost all the DB functions. In addition, all the new queries were manually tested.

[PG] Migrate web queries (#1228)

d3e50ec

Migrates the web questies to postgres and migrates the web tests to pytest. Additionally identified and fixed some small query bugs in `tables/assay.py`, `tables/participant.py` and `tables/sample.py`.

[PG] Reordered migration files (#1233)

36b457d

Reorders the migration files so that rolling back the default data migration doesn't violate foreign key constraints.

[SET-1020] Revisit TestComment test class (#1231)

e5b3b75

This PR migrate `comment` query-related test cases.

[PG] Revisit sequencing group tests (#1236)

cfa1f09

Fixes `sequencing_group` tests that were previously skipped due to missing functionality. Also replaces fixture that relied on manually inserting into the database.

[PG] Migrate output file queries (#1223)

3b24e86

Migrates the analysis output file queries to be compatible with postgres. Migrates the corresponding tests to pytest. --------- Co-authored-by: Yash <nevoodoo@users.noreply.github.com>

Identify COUNT(*) key as SQL to prevent bad quoting (#1241)

ea22156

[SET-1032] Migrate billing tests to pytest (#1240)

9932472

Migrate billing-related test cases from unittest to pytest. Remove `unittest.mock.MagicMock` usages and introduce dummy mock classes (in `test/testbqbase.py`) and pytest `monkeypatch`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Postgres Migration#1190

Postgres Migration#1190
dancoates wants to merge 41 commits intodevfrom
postgres-migration

dancoates commented Feb 2, 2026

Uh oh!

codecov bot commented Feb 3, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

dancoates commented Feb 2, 2026

Uh oh!

codecov bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Checklist ✅

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov bot commented Feb 3, 2026 •

edited

Loading

github-actions bot commented Feb 6, 2026 •

edited

Loading