Skip to content

Compare DB Diff Calculations #1321

@opotowsky

Description

@opotowsky

The DB difference calcs are all done like so: 2 * (X - Y) / (X + Y)

This has some pros and cons, some discussed in the DiffResults docstring:

All differences are based on a weird type of relative difference, which uses the
mean of the reference and source data elements as the normalization value:
2*(C-E)/(C+E). This is somewhat strange, in that if the two are very different, the
reported relative difference will be smaller than expected. It does have the useful

I want to start a discussion on this calculation so we can handle the following scenarios better.

1. Difference is large but shows as small fraction

This is noted as an issue in the docstring. This could be bad since someone could miss this easily, thinking something isn't impacting a parameter.

2. Bounds SHOULD be +/- 2 but some diff values are much larger

If a negative value becomes positive or vice versa, the diff will be much larger than 2 in magnitude. This is an edge case we should filter for since we sometimes allow negative values even though they are non-physical.

3. There is a lot of confusion around what a difference value represents.

Ultimately, I would like all "normal" diffs to show as a fraction or a percent, and all "edge cases" to be better indicated as such (i.e., src is 10 and ref is 0 will always give a diff of 2, but that isn't obvious upon seeing the number 2).

Maybe we decide to change the calculation, or maybe we just institute better filtering. I don't know. I am looking for opinions!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions