Add method to get QC web reports from db with web api#1074
Add method to get QC web reports from db with web api#1074
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev #1074 +/- ##
==========================================
- Coverage 82.54% 82.48% -0.07%
==========================================
Files 189 189
Lines 16535 16575 +40
==========================================
+ Hits 13649 13672 +23
- Misses 2886 2903 +17 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
dancoates
left a comment
There was a problem hiding this comment.
This looks good just a few tweaks needed
db/python/layers/web.py
Outdated
| JSON_UNQUOTE(JSON_EXTRACT(a.meta, '$.stage')) as stage, | ||
| GROUP_CONCAT(DISTINCT asg.sequencing_group_id) as sequencing_groups | ||
| FROM analysis a | ||
| LEFT JOIN analysis_output ao ON ao.analysis_id = a.id |
There was a problem hiding this comment.
The table name is analysis_outputs
db/python/layers/web.py
Outdated
| AND a.status = 'COMPLETED' | ||
| AND JSON_UNQUOTE(JSON_EXTRACT(a.meta, '$.sequencing_type')) IN :sequencing_types | ||
| AND JSON_UNQUOTE(JSON_EXTRACT(a.meta, '$.stage')) in :stages | ||
| GROUP BY a.id, ao.output, a.timestamp_completed, sequencing_type, stage |
There was a problem hiding this comment.
No need to group by a.timestamp_completed, sequencing_type, stage as these all relate to analysis table fields and you're already grouping by the primary key id
db/python/layers/web.py
Outdated
| a.id, | ||
| a.timestamp_completed, | ||
| ao.output, | ||
| JSON_UNQUOTE(JSON_EXTRACT(a.meta, '$.sequencing_type')) as sequencing_type, |
There was a problem hiding this comment.
You should be able to use JSON_VALUE rather than JSON_UNQUOTE(JSON_EXTRACT( here, I think it'll be a little faster
There was a problem hiding this comment.
Same for in the WHERE statement below
| if not self.output or not self.output.startswith('gs://'): | ||
| return None | ||
| bucket_name, blob_name = self.output.removeprefix('gs://').split('/', 1) | ||
| blob_name = blob_name.replace('multiqc_data.json', 'multiqc.html') |
There was a problem hiding this comment.
Would be good to have a check somewhere here that the output contains multiqc. There are other types of QC report that could passed through this - it'd be better for them to return None rather than a broken link.
There was a problem hiding this comment.
Added this to the check above
Honestly the way the multiqc reports have been written as analysis records is so not ideal. Instead of the HTML being written to the analysis record, only the multiqc_data.json path has been recorded. Hence the need for this transform. Really, we should be leveraging multiple analysis outputs for this kind of record. But for now this should work.
There was a problem hiding this comment.
Yeah agreed, it would be good if this could be more generic and support linking to any type of analysis output which identifies itself as a HTML report.
| ): | ||
| """Get qc web report analyses for a project filtered by sequencing type and stage.""" | ||
| _query = """ | ||
| SELECT |
There was a problem hiding this comment.
This query is only looking in the analysis_outputs table but will need to additionally look in the output_file table for qc reports there.
There was a problem hiding this comment.
Thanks, now joining to output_file using the analysis_outputs.file_id field, and using the output_file.path field as our output path returned by the query
| if not self.output or not self.output.startswith('gs://'): | ||
| return None | ||
| bucket_name, blob_name = self.output.removeprefix('gs://').split('/', 1) | ||
| blob_name = blob_name.replace('multiqc_data.json', 'multiqc.html') |
There was a problem hiding this comment.
Yeah agreed, it would be good if this could be more generic and support linking to any type of analysis output which identifies itself as a HTML report.
This PR
WebAPIendpoint to get QC report entries for a given project, and for given sequencing types and analysis.meta stages.WebLayerto query the database for the QC entriesWebDbwhich queries theanalysisandanalysis_outputstable for QC records with the given sequencing types and stages.ProjectQcWebReportandProjectQcWebReportInternalanalysis.outputinto the pure HTML path that can be visited in the browserStill TODO:
At present, when visiting the Metamist
Projectpage, the user is presented with links to MultiQC reports.These reports are hard-coded URLs that are based on the dataset name and nothing else (see the frontend code). This is bad because these links go to historic QC reports, not recent ones. Mistakenly viewing old reports can lead to errors and misunderstandings from unaware analysts.
To fix this, we need to query the database to get the analysis records and file paths to the QC outputs, and then use some logic to feed the HTML path of the latest report in to the front end.