Predictive modeling in Python.
This code in this repository attempts to build a predictive model on a subset of data collected by Steinmetz et al. for their 2019 publication. Final project for STA 141 @ UC Davis.
The contents of this repository are described below.
scripts/contains the files used to load in the.rdssession files into Python and store them.scripts/open_rds.pycontains a script used to open the.rdsfiles and store them asListVectorobjects. See the module docstring for details.scripts/mouse.pycreates a class to contain the information on a given session. Data are transformed to have uniform data types across sessions (mainlynumpy.ndarrayandpandas.DataFrameobjects). This class also transforms some of these raw data into features that will be used for the predictive models. These features are not present in the orginal.rdsfiles and are used extensively in the report.
data/contains the original.rdsfiles.test_data/contains the test.rdsfiles used to evaluate the model.figures/contains all of the plots used in the report. The report does not pull from this folder; the plots are created dynamically. This folder serves as a separate repository of images.
data_structure.ipynbcontains scripts to test the pipelines between R and Python. It serves as validation effort to ensure theopen_rds.pyfile is working well and that the data transformation applied bymouse.pyare giving accurate results when compared to the raw data.exploratory_data_analysis.ipynbcontains the basic data analysis conducted on the data for the purpose of this report.model_training.ipynbconducts the actual traning of models and the selection of a "best" model.model_test.ipynbcontains the scripts that test the full model on the test sets intest_data/and an analysis on the effectiveness of that model.
random_forest_model.pklis the final model selected frommodel_training.ipynb. Usejoblib.load('random_forest_model.pkl')to load the model.
report.ipynbcontains the written report.report.htmlcontains the HTML compiled report, generated viajupyter nbconvert report.ipynb --to html --no-inputto omit the code cells. This is the file submitted for the assignment.