Skip to content

Clean up classfit and classconfidence in comminterpretation #419

@regetz

Description

@regetz

The comminterpretation table has fields classfit and classconfidence that are sorta supposed to conform to a controlled list, but this is not enforced anywhere, and the database has various unexpected values. It would be nice to translate these into allowed values (and then decide whether to implement constraints in the API or DB to keep these from drifting again in the future).

  • classfit: The data dictionary (here) says we should have only 5 string values, but actual data has 6 ("High" doesn't belong), plus numeric values 1-5. The numeric values likely correspond worst to best ordering -- but occasionally 1 can be best? May need to manually review plots with these values to verify the ordering. "High" is likely meant to be a confidence indicator (see below)?
select classfit, count(*) from comminterpretation group by classfit order by classfit;
/*
            classfit             | count
---------------------------------+-------
 1                               |   109
 2                               |  1147
 3                               |  4384
 4                               |  5048
 5                               |  4081
 Absolutely correct              | 24086
 Absolutely wrong                |     2
 Good answer                     |  3660
 High                            |     2
 Reasonable or acceptable answer |   518
 Understandable but wrong        |  2718
                                 | 57196
*/
  • classconfidence: Data dictionary (here) says we should have only 3 string values (High, Medium, Low), but actual data has various abbreviations of these, plus four records with the number 1 (perhaps corresponding to fit "high" in the comm fit field?).
select classconfidence, count(*) from comminterpretation group by classconfidence order by classconfidence;
/*
 classconfidence | count
-----------------+-------
 1               |     4
 H               |  3249
 High            |  5385
 L               |   170
 Low             |   327
 M               |  1539
 Med             |   132
 Medium          |   845
 h               |     1
 high            |   658
 low             |   333
 med             |   381
 medium          |    14
                 | 89913
*/

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    Status

    No status

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions