Draft
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## dev #1190 +/- ##
==========================================
- Coverage 83.20% 78.40% -4.80%
==========================================
Files 216 167 -49
Lines 18740 14402 -4338
==========================================
- Hits 15592 11292 -4300
+ Misses 3148 3110 -38 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
9703f4a to
a1b1e38
Compare
PR Checklist ✅Have you checked the following?
Check all boxes before merging. |
…abase (#1183) As the first step of switching to postgres we need a working postgres database with the correct schema. This change removes the old liquibase changelog and replaces it with a folder of dbmate migrations. The migrations are split into a few steps: 1. The base tables 2. Addition of sys_period columns to allow for temporal tables 3. Addition of history tables based on main tables 4. Addition of triggers to automatically write changes to history tables 5. Addition of all indexes and foreign key constraints - this is done as a separate step to the table creation to make it easier to bulk import the initial data without needing to do it in the correct order. This postgres schema fixes a few longstanding issues with the mariadb schema: - removed _external_id_unused columns from family, participant, sample tables - Added missing audit log id columns to analysis_outputs and output_file tables - Fixed type of output file meta column to be json rather than varchar - renamed nullIfInactive to null_if_archived in sequencing_group_external_id table, to be consistent with other column names and the `archived` column in sequencing_group table - added missing primary key to sequencing_group_external_id table - removed full text index on comment table, it wasn't being used The Dockerfile in the `/db` directory is based on postgres 18, but installs dbmate and the postgres temporal tables extension in addition. The docker-compose file only has a single service at the moment but allows easy management of the necessary volumes. To run database migrations, run: ```bash docker compose exec postgres dbmate up ``` and to run the database, run `docker compose up` The `/db/schema.sql` file is automatically generated by dbmate, and can be used to keep track of the effect of new migrations on the overall schema
This updates the existing `connect.py` to add an implementation of postgres connections using psycopg and an async connection pool. The choice to use psycogp is largely due to its postgres specific features like supporting row_cursors, supporting COPY and better parameterization of queries. The `databases` library that we have been using for mariadb is also now archived so it is a good time to move to something else. This also does the migration of the necessary queries to get a list of the user's projects as an example of the postgres connection in use. The rest of the API is broken.
This still uses testcontainers but takes a different approach to the previous test setup. The tests have been very slow for a while due to heavy set up and tear down requirements for each test. This update utilizes the ability of postgres to very quickly create a database based on another template database. This allows us to run the database migrations once, and then clone the template database for each test. In addition this disables a bunch of durability settings for postgres and uses a ram disk for the data storage which should help to make the database creation process as quick as possible. This also commences a move to a more idiomatic pytest test configuration using fixtures for the database setup. This allows tests to easily request what resources they need, including what permissions they want the test user to have. See the test README.md for more information and examples of tests. As most tests haven't been ported yet, to stop pytest from trying to run the broken tests, there is a include list of working tests specified in the `pytest.toml` file. We can extend this as we port tests across to use this new setup, and eventually remove the include list once all tests are ported.
We use mariadb's JSON_MERGE_PATCH function in several places to merge
entity meta fields. Postgres doesn't have a directly comparable function
that we can use. We could switch to using postgres' json concat ||
operator which does merge jsonb documents, but doesn't have the more
advanced functionality that JSON_MERGE_PATCH offers that allows updating
nested objects.
While we could probably work around not having the ability to update
nested objects, the main feature of JSON_MERGE_PATCH that we use
frequently is the ability to delete top level keys by passing in a null
value as the patch. Postgres' json concat doesn't have a way to do this.
So to avoid breaking changes it is best to implement a spec compliant
version of JSON_MERGE_PATCH that we can use.
The implementation is in the migration, and there are tests of all the
examples from the RFC, as well as some extra edge cases.
There is a significant performance loss in going down this route, the
new json_merge_patch function is about 10 times slower than the built in
json concat with `||`. We only use this method on entity updates and
never do very large batch updates so this shouldn't be too much of a
problem.
__Benchmark__
```sql
CREATE TEMP TABLE benchmark_json_data AS
SELECT
id,
-- Target: nested JSON with various depths
jsonb_build_object(
'id', id,
'name', 'item_' || id,
'count', id * 10,
'active', (id % 2 = 0),
'metadata', jsonb_build_object(
'created', now()::text,
'version', 1,
'tags', jsonb_build_array('tag1', 'tag2', 'tag3'),
'nested', jsonb_build_object(
'level2', jsonb_build_object(
'level3', jsonb_build_object(
'deep_value', id * 100
)
)
)
),
'settings', jsonb_build_object(
'option_a', true,
'option_b', false,
'option_c', 'value_' || id
)
) as target,
-- Patch: updates some values, adds new ones, removes others (null)
jsonb_build_object(
'name', 'updated_item_' || id,
'new_field', 'new_value_' || id,
'metadata', jsonb_build_object(
'version', 2,
'updated', now()::text,
'nested', jsonb_build_object(
'level2', jsonb_build_object(
'level3', jsonb_build_object(
'new_deep', 'added'
)
)
)
),
'settings', jsonb_build_object(
'option_a', false,
'option_d', 'new_option'
)
) as patch
FROM generate_series(1, 100000) as id;
```
```sql
SELECT id, target || patch as merged
FROM benchmark_json_data;
-- ~350ms
```
```sql
SELECT id, json_merge_patch(target, patch) as merged
FROM benchmark_json_data;
--- ~ 3000ms
```
Notable changes: - The project deletion query is updated to run as multiple queries as psycopg doesn't support multiple queries in the same request - merging project meta is now done using the custom json_merge_patch function introduced in the previous commit, to maintain functional compatibility with the mariadb implementation - Tests no longer need to do manual manipulation of groups, they can use the test fixtures introduced recently
a1b1e38 to
6493a47
Compare
This will allow us to use template strings to build sql queries. This also updates all other python dependencies to improve compatibility with python 3.14, this was necessary for some dependencies, particularly those that are written in rust and use PyO3 for bindings, as older versions of PyO3 didn't support 3.14 and attempting to build failed with the error: ``` the configured Python interpreter version (3.14) is newer than PyO3's maximum supported version (3.13) ``` Other packages are updated as we'll need to do comprehensive testing of this anyway so now seemed like as good a time as any. After updating to 3.14 I immediately noticed that the tests were taking about 10x as long as previously. Investigation led down a bit of a rabbit hole but ended up at the credential helpers specified in ~/.docker/config.json for gcloud container authentication. For some reason during tests these credential helpers were being invoked many times and they are quite slow so were greatly impacting test times. This slowdown wasn't present when we were using 3.11, I think because the credential helpers used the active python version rather than their own one. Something about spinning up a process with a different python version was very slow. To work around this, there is a new fixture introduced in the tests that creates an empty docker config and sets that as the config to use. The test doesn't require the credential helpers as the container it uses is not in artifact registry or gcr. This problem wouldn't have existed in CI but this fixture won't break anything there either This also updates some settings and CI that were pointing to 3.11
This adds a second ruff config file for the python package which specifies 3.10 compatibility. This allows the rest of code to use 3.14 features
Storing the active postgres connection on the `Connection` object simplifies instances where a transaction is initiated in the `layer/` function but executed in the `table/` function. Prior to these changes, the connection pool was stored on `Connection`, which resulted in ambiguity in the `table/` about which connection to use. --------- Co-authored-by: Dan Coates <dan.coates@populationgenomics.org.au>
This also rewrites the tests to actually run the filter queries against some data in the database. Some bits of the query building have been removed to reduce complexity, things like checking if a query would logically result in an overall falsey where statement isn't necessary as the query planner is smart enough to optimize for that. The implementation of filtering by querying json blobs in the db is left out of this as the correct approach is yet to be established
Adds a migration to add default values to the following tables: ``` assay_type sample_type sequencing_type sequencing_technology sequencing_platform analysis_type ```
This takes the existing assay table methods and replaces them with postgres compatible ones. Also updates the tests to the new pytest structure. notable changes: - Removes the `open_transaction` param from assay methods, this isn't needed as we can check if we're in a transaction and conditionally avoid creating a nested transaction (nested transactions are supported in postgres with savepoints but there's a performance impact and we don't need them so best to avoid) - Some dead/unused code has been removed not implemented: - Some tests that relate to sequencing groups are skipped - Some bits of tests that relate to querying meta are commented out - There's some temporary fixtures added to create the necessary samples for assays to be attached to, these should be replaced with ones that use python methods once samples/sgs are migrated
This helper on the connection class allows us to conditionally enter a transaction if not already in one. Psycopg supports nested transactions with postgres savepoints but partial rollbacks is not a feature that we actually need - so in most cases it is fine to just have a single parent transaction. There's a bit of code involved in checking the current transaction status so this wraps that in a method on the connection model to save having to write it each time.
This PR migrates `family` and `family_participant` entities to **PostgreSQL**. It also updates related test classes that the `Family` layer is tested. --------- Co-authored-by: Dan Coates <dan.coates@populationgenomics.org.au>
ignore PLR2004 in tests as we often need to assert that the length of a list equals a certain number, in these cases there's no point defining a constant to use. Also fixes a few other misc errors
c61e31d to
f416e75
Compare
this'll enable us to enforce that other linting checks pass while working on the migration
This probably isn't as performant as it could be as it is converting the jsonb columns to text, only to then parse them again in duckdb. We could probably get them as dicts and then create an arrow table from that, but would need to manually define the schema for the table which would be a bit tricky. There's a rough plan to change how these meta tables work in the future so don't want to do much of a refactor for now as long as performance isn't too bad when we test in production. All tests are skipped for now, but should work in theory. I've manually tested the endpoints and they seem to work fine.
Mirgrate the participant and participant phenotype queries to use postgres connection In the process of this migration an issue was uncoverd in the test `test_graphql_upsert_participant_with_phenotypes`. The project passed through graphql as a string was not being used in the upsert. The appropriate data access checks were made, but then the value was not being utilised in the upsert_participants method itself. This PR includes a fix where in `upsert_participants` in `api/graphql/mutations/participant.py` the project id is fetched when the project access is checked and that value is passed down all the way through to the table where `create_participant` is called. All the tests should be passing excluding `test_query_by_sample` since the Sample tables have yet to be migrated. --------- Co-authored-by: Dan Coates <dan.coates@populationgenomics.org.au> Co-authored-by: Piyumi Rameshka <piyumi.amarasinghebaragamage@populationgenomics.org.au>
This PR migrate Audit log table-related SQL queries in the `audit_log.py` and `base.py` to PostgreSQL. --------- Co-authored-by: Dan Coates <dan.coates@populationgenomics.org.au>
Pretty straightforward migration. Most tests skipped at the moment but should in theory work when re-enabled as more things are migrated.
Migrate the sample queries to postgres Also adds in a fix to the `to_sql` function in the generic filter to return `None` if the filters are empty. --------- Co-authored-by: Dan Coates <dan.coates@populationgenomics.org.au> Co-authored-by: Piyumi Rameshka <piyumi.amarasinghebaragamage@populationgenomics.org.au>
These postgres migration queries were done prior to the update to Python 3.14. This uplifts them to use the Template string feature we are now relying on in 3.14.
This PR migrates the Enum table queries to PostgreSQL and adds a new test class to verify those queries.
This migrates sequencing group table methods to be compatible with
postgres, as well as converts the corresponding test code to use pytest.
Additionally, unused sequencing group table methods have been removed.
Not implemented:
- Table functions that are not covered by sequencing group tests have
not had tests written. This includes the following:
- `get_sequencing_groups_by_analysis_ids`
- `get_samples_create_date_from_sgs`
- `get_all_sequencing_group_ids_by_sample_ids_by_type`
- `get_type_numbers_for_project`
- `recreate_sequencing_group_with_new_assays`
- Sequencing group tests functions that relate to Analyses have been
skipped
- Sequencing group tests that rely on the use of samples across multiple
projects have been skipped
- This is only until migration of samples is complete
Adds the ability to set the name of the test project that the `connection_with_project` fixture uses. This can be done using the `@pytest.mark.project_name(<name>)` decorator.
Adds a check for None values to the `SequencingGroupAssayFilter` when querying graphql for sequencing groups.
This PR migrate analysis_runner queries.
This PR migrates the `cohort` table to PostgreSQL. Along with the query changes, `test_cohort.py` and `test_cohort_status.py` files are also updated. Additionally, `test_mutations.py` also covers parts of the cohort layer. Since this test class depends on several other entities, it would be more appropriate to update it once those related migrations are completed. --------- Co-authored-by: Dan Coates <dan.coates@populationgenomics.org.au>
Migrate the analysis queries to use postgres, and migrate all the tests to pytest. Discovered and fixed two minor bugs in `participant.py` and `sequencing_group.py` also. --------- Co-authored-by: Dan Coates <dan.coates@populationgenomics.org.au> Co-authored-by: Piyumi Rameshka <piyumi.amarasinghebaragamage@populationgenomics.org.au>
This PR migrate project insights queries to use PostgreSQL syntax. There aren't many test cases for this work. But the existing two testcases invoke almost all the DB functions. In addition, all the new queries were manually tested.
Migrates the web questies to postgres and migrates the web tests to pytest. Additionally identified and fixed some small query bugs in `tables/assay.py`, `tables/participant.py` and `tables/sample.py`.
This PR unskips the test cases in: - test/test_external_id.py - test/test_graphql.py - test/test_pedigree.py - test/test_search.py - test/test_update_participant_family.py The test cases in these files were skipped until all the dependent entity queries are migrated to PostgreSQL syntax. While unskipping them, some test cases were failing and required some minor fixes. **There are still some test cases skipped until _querying JSON keys_ and _case insensitive string matching_ work.** In addition, the test cases in test/test_cohort_status.py file are updated to reuse the `connection/db` created in the test class `set_up` fixture instead of creating a new test DB both in the set_up and again in the testcase.
Create a filter for json columns in the new land of postgres There are some notes and a new function called `infer_type_from_filter`. I'm open to alternatives for inferring the generic `T` type but for right now we pull it from the type of values in the filters. There was also a [discussion on slack](https://centrepopgen.slack.com/archives/C0ADQ18TAES/p1771550184864699) around the behaviour of exclusion filters `neq` and `nin` around whether they should include rows that either do not have the meta key it's filtering on or even an empty meta. Originally, ALL rows with the key missing would be excluded if you filter on the meta columns. This PR modifies that behaviour and will include those rows on any `neq` or `nin` filter unless the `isnull=False` filter is also specified. The tests have been modified accordingly to test these new behaviours and the `neq` operation has been changed from `!=` to `IS DISTINCT FROM` to enable null values to pass through. In order to test the functionality as we currently use it, the tests that cover the new logic are added to `test_sequencing_groups.py` since this filtering is predominantly used to filter sgs based off of assay meta data. --------- Co-authored-by: hrrsheen <harrison.sheen@populationgenomics.org.au> Co-authored-by: Dan Coates <dan.coates@populationgenomics.org.au>
Reorders the migration files so that rolling back the default data migration doesn't violate foreign key constraints.
This PR migrate `comment` query-related test cases.
Fixes `sequencing_group` tests that were previously skipped due to missing functionality. Also replaces fixture that relied on manually inserting into the database.
Migrates the analysis output file queries to be compatible with postgres. Migrates the corresponding tests to pytest. --------- Co-authored-by: Yash <nevoodoo@users.noreply.github.com>
As described in the ticket for SET-1003, Mariadb was automatically doing very flexible, case ignoring string comparisons. This update to the GenericFilter makes all string comparisons `ILIKE` or `CASEFOLD(value)` to replicate that pre-exiting functionality after the move to Postgres. The tests have been updated accordingly.
…r call (#1238) The get_connection() function returned by dependable_get_connection_getter is called repeatedly for related graphql resolvers. Caching the project maps in dependable_get_connection_getter avoids repeated calls to get the same information from refresh_projects() and speeds up this hot path. But don't cache them for other refresh_projects() call sites for which it can be important to get a fresh list of projects.
Migrate billing-related test cases from unittest to pytest. Remove `unittest.mock.MagicMock` usages and introduce dummy mock classes (in `test/testbqbase.py`) and pytest `monkeypatch`.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.