Add method to get QC web reports from db with web api by EddieLF · Pull Request #1074 · populationgenomics/metamist

EddieLF · 2025-06-17T02:53:16Z

This PR

Introduces a new WebAPI endpoint to get QC report entries for a given project, and for given sequencing types and analysis.meta stages.
Adds a new method to the WebLayer to query the database for the QC entries
Adds a new method to the WebDb which queries the analysis and analysis_outputs table for QC records with the given sequencing types and stages.
Adds new models ProjectQcWebReport and ProjectQcWebReportInternal
- These models capture the analysis information (id, timestamp, seq type, stage, output path, sequencing groups)
- Contains a method to transform the analysis.output into the pure HTML path that can be visited in the browser

Still TODO:

Update the frontend code to pull in the QC web reports for a dataset and display the URLs for the most recent reports
Possibly add logic to display multiple / all report links? And let the user see what sequencing groups are in each report?

At present, when visiting the Metamist Project page, the user is presented with links to MultiQC reports.

These reports are hard-coded URLs that are based on the dataset name and nothing else (see the frontend code). This is bad because these links go to historic QC reports, not recent ones. Mistakenly viewing old reports can lead to errors and misunderstandings from unaware analysts.

To fix this, we need to query the database to get the analysis records and file paths to the QC outputs, and then use some logic to feed the HTML path of the latest report in to the front end.

codecov · 2025-06-17T04:17:33Z

Codecov Report

Attention: Patch coverage is 56.41026% with 17 lines in your changes missing coverage. Please review.

Project coverage is 82.48%. Comparing base (1777145) to head (4b3ce3c).

Files with missing lines	Patch %	Lines
models/models/web.py	70.83%	7 Missing ⚠️
api/routes/web.py	28.57%	5 Missing ⚠️
db/python/layers/web.py	37.50%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##              dev    #1074      +/-   ##
==========================================
- Coverage   82.54%   82.48%   -0.07%     
==========================================
  Files         189      189              
  Lines       16535    16575      +40     
==========================================
+ Hits        13649    13672      +23     
- Misses       2886     2903      +17

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

dancoates

This looks good just a few tweaks needed

dancoates · 2025-06-18T00:33:09Z

db/python/layers/web.py

+            JSON_UNQUOTE(JSON_EXTRACT(a.meta, '$.stage')) as stage,
+            GROUP_CONCAT(DISTINCT asg.sequencing_group_id) as sequencing_groups
+        FROM analysis a
+        LEFT JOIN analysis_output ao ON ao.analysis_id = a.id


The table name is analysis_outputs

dancoates · 2025-06-18T00:37:55Z

db/python/layers/web.py

+        AND a.status = 'COMPLETED'
+        AND JSON_UNQUOTE(JSON_EXTRACT(a.meta, '$.sequencing_type')) IN :sequencing_types
+        AND JSON_UNQUOTE(JSON_EXTRACT(a.meta, '$.stage')) in :stages
+        GROUP BY a.id, ao.output, a.timestamp_completed, sequencing_type, stage


No need to group by a.timestamp_completed, sequencing_type, stage as these all relate to analysis table fields and you're already grouping by the primary key id

dancoates · 2025-06-18T00:39:46Z

db/python/layers/web.py

+            a.id,
+            a.timestamp_completed,
+            ao.output,
+            JSON_UNQUOTE(JSON_EXTRACT(a.meta, '$.sequencing_type')) as sequencing_type,


You should be able to use JSON_VALUE rather than JSON_UNQUOTE(JSON_EXTRACT( here, I think it'll be a little faster

Same for in the WHERE statement below

dancoates · 2025-06-18T00:44:11Z

models/models/web.py

+        if not self.output or not self.output.startswith('gs://'):
+            return None
+        bucket_name, blob_name = self.output.removeprefix('gs://').split('/', 1)
+        blob_name = blob_name.replace('multiqc_data.json', 'multiqc.html')


Would be good to have a check somewhere here that the output contains multiqc. There are other types of QC report that could passed through this - it'd be better for them to return None rather than a broken link.

Added this to the check above

Honestly the way the multiqc reports have been written as analysis records is so not ideal. Instead of the HTML being written to the analysis record, only the multiqc_data.json path has been recorded. Hence the need for this transform. Really, we should be leveraging multiple analysis outputs for this kind of record. But for now this should work.

Yeah agreed, it would be good if this could be more generic and support linking to any type of analysis output which identifies itself as a HTML report.

dancoates · 2025-06-18T00:44:43Z

db/python/layers/web.py

+    ):
+        """Get qc web report analyses for a project filtered by sequencing type and stage."""
+        _query = """
+        SELECT


This query is only looking in the analysis_outputs table but will need to additionally look in the output_file table for qc reports there.

Thanks, now joining to output_file using the analysis_outputs.file_id field, and using the output_file.path field as our output path returned by the query

dancoates · 2025-06-18T01:43:50Z

models/models/web.py

+        if not self.output or not self.output.startswith('gs://'):
+            return None
+        bucket_name, blob_name = self.output.removeprefix('gs://').split('/', 1)
+        blob_name = blob_name.replace('multiqc_data.json', 'multiqc.html')


Yeah agreed, it would be good if this could be more generic and support linking to any type of analysis output which identifies itself as a HTML report.

EddieLF added 2 commits June 17, 2025 12:35

Add method to get QC web reports from db with web api

3a48801

Rename to QcWebReports for clarity

b48f1e0

dancoates self-requested a review June 17, 2025 04:00

dancoates requested changes Jun 18, 2025

View reviewed changes

Update SQL query with JSON_VALUE, validate multiqc in output

4b3ce3c

EddieLF requested a review from dancoates June 18, 2025 01:25

dancoates approved these changes Jun 18, 2025

View reviewed changes

dancoates force-pushed the dev branch from c53af7f to c10d8fd Compare October 20, 2025 05:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add method to get QC web reports from db with web api#1074

Add method to get QC web reports from db with web api#1074
EddieLF wants to merge 3 commits intodevfrom
update_qc_report_links

EddieLF commented Jun 17, 2025

Uh oh!

codecov bot commented Jun 17, 2025 •

edited

Loading

Uh oh!

dancoates left a comment

Uh oh!

dancoates Jun 18, 2025

Uh oh!

dancoates Jun 18, 2025

Uh oh!

dancoates Jun 18, 2025

Uh oh!

dancoates Jun 18, 2025

Uh oh!

dancoates Jun 18, 2025

Uh oh!

EddieLF Jun 18, 2025

Uh oh!

dancoates Jun 18, 2025

Uh oh!

dancoates Jun 18, 2025

Uh oh!

EddieLF Jun 18, 2025

Uh oh!

dancoates Jun 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

EddieLF commented Jun 17, 2025

Uh oh!

codecov bot commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dancoates left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Jun 17, 2025 •

edited

Loading