Skip to content

How to interpret the samples? #3

@ghost

Description

Hi Ed,

thank you very much for adding the process_mimic.py script :)

It all worked fairly painlessly, following your clear instructions (I used "counts") - and now I'm the very proud owner of 10000 synthetic EHR's - woohoo !!!

So I loaded samples, but I'm not sure how to interpret them?

>>> import numpy as np
>>> X = np.load('/home/ajay/PythonProjects/medgan-master/samples/samples.npy')
>>> X
array([[ 0.42479137,  0.38992843,  0.3843686 , ...,  0.48570082,
         0.44278869,  0.4656629 ],
       [ 0.28643027,  0.45749718,  0.23394403, ...,  0.47090551,
         0.41072363,  0.43643555],
       [ 0.29359645,  0.46955556,  0.22549649, ...,  0.48150307,
         0.41780272,  0.45492986],
       ..., 
       [ 0.56480783,  0.66771448,  0.54325938, ...,  0.47483209,
         0.43128845,  0.45304856],
       [ 0.68514657,  0.79574692,  0.73424697, ...,  0.47857872,
         0.43853614,  0.44970644],
       [ 0.17376943,  0.19806506,  0.27509841, ...,  0.47925362,
         0.44123808,  0.46058744]], dtype=float32)
>>> X.shape
(10000, 1071)
>>> synthetic_ehr = X[0,:]
>>> synthetic_ehr
array([ 0.42479137,  0.38992843,  0.3843686 , ...,  0.48570082,
        0.44278869,  0.4656629 ], dtype=float32)

I just realized I'm not sure what synthetic_ehr is? Does it look right to you?

I thought it would be like a row of a table where the columns are the 1071 ICD-9 codes, and the counts are the number of times those entities appear in the patients ehr? So the counts should be whole numbers, and would give some idea of co-morbidities? For example, cardiovascular and metabolic disorders would frequently co-occur?

So would one way of analysis be a correlation matrix?

Thanks very much 👍

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions