Skip to content

Update configuration around external_metrics.txt  #26

@swarbred

Description

@swarbred

@cschu

Currently in the external_metrics.txt file for each metric_name_prefix a single value can be specified for the multiplier and not_fragmentary_min_value.

However these translate into multiple metrics

e.g.
for metric class aln_tran, aln_prot we have *_aln_nF1 *_aln_jF1 *_aln_eF1 and *aln_aF1
for metric class seq_prot we have *_qCov and *_tCov

This means that the user has to use the same value for each of these metrics. In practice I have been specifying different values for in particular *_qCov and *_tCov which requires me after running gmc configure to manually update the gmc_run.scoring.yaml file.

Some suggestions about how to improve this

  1. The not_fragmentary_min_value is used in the mikado not_fragmentary expression, currently we make use of both *_qCov and *_tCov , again in practice I have been setting *_qCov to 1 which effectively turns it off as {operator: gt, value: 1} is not true. I would simply remove using *_qCov in the not_fragmentary section as having a high query coverage does not indicate the model is not a fragment. We already only use *aln_aF1 in this section (i.e ignore *_aln_nF1 *_aln_jF1 *_aln_eF1). Making this change means we still only need one not_fragmentary_min_value to be set for each metric_name_prefix.

  2. For the multipliers it’s more useful to be able to specify multiple values. One “solution” would allow a comma separated list of multiplier values that are then applied in a set order i.e. metric class aln_tran, aln_prot allow up to 4 values applied to *aln_aF1, *_aln_nF1 *_aln_jF1 and *_aln_eF1 and metric class seq_prot 2 values applied to *_qCov and *_tCov.

This is a bit messy as it means for the multiplier column you allow 1 value, 2 values or 4 values for different metric classes. I think for convenience you would still want a logic where if only 1 value is applied then this is applied to all the resulting metrics, but you would need to also deal with user cases where someone specifies 3 values for the metric class aln_tran, aln_prot.

I’m welcome to other ways of doing this.

At the moment it does mean that everytime I run gmc I have to do this manual change and that is not ideal even if I do expect users to sometimes adjust the gmc_run.scoring.yaml file prior to running gmc run.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions