Compute persistence images for the pysicians' network data and then use K-fold cross validation (CV) to select the best algorithm and parameters (for the algorithm and the persistence image weight function).
- Generate persistence diagrams (PDs) from the graph data (not included here) from a file of HSA IDs.
- Generate Test and Fold indicies for the PDs.
- Run cross validation.
- Module: paths, parameter values, functions, and other useful information
- Code/modules/cv_prep_vars.py
- Most of the editing and generic (algorithm agnostic) information is here
- Command line script: Runs CV for a given year, outcome, pixel resolution, H dimension, scoring metric, and k/test percent information
- This script should be run through a job scheduler because it can take days for some algorithms. See the "submit_scritps" directory, jobs_submit.sh (to easily submit and name jobs), and eg_submit_cmd example file.
- Module: paths, parameter values, functions, and other useful information
- Edit graph script:
- Option to run before generating PDs
- Generate PDs:
- Fixes to this script, including making it a command line script
- Generate Test and Fold indicies
- Generate CV Data:
- Make more reproducible and faster
- Finish testing modifications and replace original version
- Add option to select an arbitrary pixel size
- Add option to output data in a different directory
- Select Best Parameters/Model(s)
- Determine how to select the best models
- Fit and Explore Best Model/s
- Fit best models and explore the results
- Sensitivity Tests
- Re-do steps 4-6 with different outcome definitions and different pixel resolutions