Skip to content

ppv and ppv_upper, ppv_lower mismatch #2

@onishchenkod

Description

@onishchenkod

When plotting ppv, ppv_lower and ppv_upper as generated by zedstat, I get the following mismatch (as shown in attached plot).

ax.plot(PRC['tpr'],
            PRC['ppv'],
            label = "%s: (%s)" % (LABEL, PREVAL_LABEL) if LABEL else PREVAL_LABEL,
            color = COLOR)
plt.fill_between(
    x = PRC['tpr'],
    y1 = PRC['ppv_upper'],
    y2 = PRC['ppv_lower'],
    color = COLOR,
    alpha = CB_ALPHA
)

PRC_Q

The code used to generate the plotted curve is

    fpr, tpr, thresholds = roc_curve(DF[target_col], DF[risk_col])
    ROC = pd.DataFrame({
        'fpr': fpr,
        'tpr': tpr,
        'threshold': thresholds,
    })
    zt=zedstat.processRoc(df=ROC,
                          order=3,
                          total_samples=DF.shape[0],
                          positive_samples=DF[target_col].value_counts()[1],
                          alpha=0.05,
                          prevalence=DF[target_col].value_counts(normalize = True)[1] if not preval else preval)
    zt.smooth(STEP=0.001)
    zt.allmeasures(interpolate=True)
    zt.usample(precision=3)
    zt.getBounds()
    if bounds:
        df_=zt.get()
        df_u=zt.df_lim['U']
        df_l=zt.df_lim['L']
        df_=df_.join(df_u,rsuffix='_upper').join(df_l,rsuffix='_lower')
        CURVE = df_.reset_index()
        zt.df = CURVE
    return zt

this is how the same curve's plot looks like when I recompute the ppv by

PRC['ppv'] = (PRC['ppv_upper'] + PRC['ppv_lower'])/2

PRC_Q_fixne

Either ppv computation or computation of bounds is not done correctly here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions