DV0, the unaggregated data
Regenerate the aggregated data using Jiayu’s method and save that version (DV1, the aggregated data).
Link to running Notebook: Running of step1_engineering_features
Regenerate engineered features and save that version (DV2, engineered data).
Impute the data again and save that version (DV3, imputed data).
Remarks: We don't have ppo2 or o2.
Output: imputed_data_engineered_hypotensive_reordered_action_for_model.pickle, a file that has the engineered features of interest, forward fill imputed, with four action labels.
Link to running Notebook: Running of step2+Step 3_engineering_features.ipynb
Run the models and save them.
Running of Running of step4
Link to running Notebook: Testing of step4_just_kernel.ipynb
Link to running Notebook: Testing of step4_rnn_kernel.ipynb
Evaluate the models.
Link to running Notebook: Testing of step4.5_model_eval.ipynb
Run the decision points and save them with Uncertainty Labels and Decision Points status (DV4, imputed data + DP + UL).
- You need to have trained a kernel on the non-time series and have trained an RNN on time series and kernel on embedding.
For every patient:
- States 0 through the second to last state are S1 “all_states”.
- States 1 through the last state are “cand_states”.
- Run the decision point logic using just the kernel, giving you P: a matrix of decision points. If a row of P sums to more than one, that row is a decision point.
- Pass decision points through the uncertainty label mapping.
- Match DP and UL with the original time series for each patient and save.
Notebook: Running of step5_kernel_dp.ipynb
Renamed final output csv file (just decision points): kernel_computed_decision_points_justdp_uncertainty_label_withPid.csv
Renamed final output csv file (all decision points--including non decision points): kernel_computed_decision_points_all_withPid.csv
For every patient:
- Get S1, S2 the same way.
- “Window” S1 and S2 separately.
- Pass windowed S1 and S2 through the RNN embedder.
- Run the decision point logic using the RNN kernel, giving you P.
- Pass through uncertainty label.
- Match DP and UL with the original time series for each patient and save.
Note: One weakness of this process is that we can't calculate decision points for the first 7-hour stamps of a trajectory. We are windowing (chunking the dataset using a sliding window) with size 8, so the first data point in each trajectory that is valid is the 8th data point, because it has a full window before it.
Link to testing Notebook: Testing of step4.5_model_eval.ipynb
Renamed final output csv file (just decision points): rnn_computed_decision_points_justdp_uncertainty_label_withPid.csv
Renamed final output csv file (all decision points--including non decision points): rnn_computed_decision_points_all_withPid
Renamed final output csv file (just decision points): binary_kernel_computed_decision_points_justdp_uncertainty_label_withPid.csv
Renamed final output csv file (all decision points--including non decision points): binary_kernel_computed_decision_points_all_withPid.csv
Binary kernel tsne/umap source plots cluster.ipynb
Renamed final output csv file (just decision points): binary_rnn_computed_decision_points_justdp_uncertainty_label_withPid.csv
Renamed final output csv file (all decision points--including non decision points): binary_rnn_computed_decision_points_all_withPid
Binary RNN tsne/umap source plots cluster.ipynb
Run UMAP, T-SNE and save those 2D representations (DV5a/b, reduced data).
- Plot connected components in DV4, DV5a/b.
- Normalized average of features plot (DV 5) in each manually selected cluster (see step 7). This is what doctors (Leo) want to see.
- Average percentage of top 20 nearest neighbors plot (DV5a/b).
Link to Source Plots: Link to Source Plots