Skip to content

Some severe issues with the MIMIC-IV preprocessing #7

@randolf-scholz

Description

@randolf-scholz

I was reproducing the preprocessing and I noticed a few severe issues with the preprocessing provided.

datamerging.ipynb - Prescriptions are accidentally dropped completely

presc_df = presc_df.drop((presc_df['valuenum']=='3-10').index)

afterwards, the table is empty.

outputs.ipynb - Wrong labels!

outputs_label_list contains the entries "Chest Tube" and "Jackson Pratt", but these never appear as labels, the correct labels are "Chest Tube #1" and "Jackson Pratt #1"

prescriptions.ipynb - missing required filtering

  • rows with non-float dose_val_rx are not dropped
  • rows with NaT entries in starttime are not dropped

inputevents.ipynb

The code for adding repeats does in some cases not add enough repeats due to a rounding issue. This can be tested via

min_diff = (pd.to_datetime(df_new1["endtime"])-df_new1["charttime"]).groupby(level=0).min()
assert all(min_diff < pd.Timedelta("30min")), f"Did not add enough steps!"

labevents.ipynb

  • rows with NaN valued valuenum are not dropped

admissions.ipynb

We filter for patients with a single admission, however later in the other dataframes hadm_id is used as filter instead of subject_id. The issue is that there appears to be corrupted data in at least one table that gives rise to hadm_id with multiple subject_id associated with it. We can test it in datamerging via

assert all(merged_df.groupby("subject_id")["hadm_id"].nunique() == 1)
assert all(merged_df.groupby("hadm_id")["subject_id"].nunique() == 1)

Further, the hospital stay is limited to patients with 2-29 days stay. However, the charttime does not agree with this data. Sometimes, charttime starts before admittime. The longest charttime is over 52 years.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions