-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Currently in the external_metrics.txt file for each metric_name_prefix a single value can be specified for the multiplier and not_fragmentary_min_value.
However these translate into multiple metrics
e.g.
for metric class aln_tran, aln_prot we have *_aln_nF1 *_aln_jF1 *_aln_eF1 and *aln_aF1
for metric class seq_prot we have *_qCov and *_tCov
This means that the user has to use the same value for each of these metrics. In practice I have been specifying different values for in particular *_qCov and *_tCov which requires me after running gmc configure to manually update the gmc_run.scoring.yaml file.
Some suggestions about how to improve this
-
The not_fragmentary_min_value is used in the mikado not_fragmentary expression, currently we make use of both *_qCov and *_tCov , again in practice I have been setting *_qCov to 1 which effectively turns it off as {operator: gt, value: 1} is not true. I would simply remove using *_qCov in the not_fragmentary section as having a high query coverage does not indicate the model is not a fragment. We already only use *aln_aF1 in this section (i.e ignore *_aln_nF1 *_aln_jF1 *_aln_eF1). Making this change means we still only need one not_fragmentary_min_value to be set for each metric_name_prefix.
-
For the multipliers it’s more useful to be able to specify multiple values. One “solution” would allow a comma separated list of multiplier values that are then applied in a set order i.e. metric class aln_tran, aln_prot allow up to 4 values applied to *aln_aF1, *_aln_nF1 *_aln_jF1 and *_aln_eF1 and metric class seq_prot 2 values applied to *_qCov and *_tCov.
This is a bit messy as it means for the multiplier column you allow 1 value, 2 values or 4 values for different metric classes. I think for convenience you would still want a logic where if only 1 value is applied then this is applied to all the resulting metrics, but you would need to also deal with user cases where someone specifies 3 values for the metric class aln_tran, aln_prot.
I’m welcome to other ways of doing this.
At the moment it does mean that everytime I run gmc I have to do this manual change and that is not ideal even if I do expect users to sometimes adjust the gmc_run.scoring.yaml file prior to running gmc run.