In fact you have already been familiar with the complete ML pipelines by conducting the supervised and unsupervised learning labs in the past week. However, every dataset has different characteristics from others and it's important for you to practice the workflow with different types of datasets in order to remember the corresponding solutions in different contexts. In this lab, we will present you with a supervised learning problem for which linear regression analysis is not appropriate. We will show you why and how to analyze this kind of datasets by using other ML algorithms such as Random Forest.
Open the main.ipynb file in the your-code directory. Follow the instructions and add your code and explanations as necessary. At the end, in addition to completing the cells please also save your RF model as a pickle file.
main.ipynbwith your responses.- mushroom.sav file of your RF model.
Upon completion, add your deliverables to git. Then commit git and push your branch to the remote.
Mushroom Classification @Kaggle
Consequences of multicollinearity
Chi-Square Test of Independence
sklearn.model_selection.train_test_split
sklearn.ensemble.RandomForestClassifier
sklearn.metrics.confusion_matrix
sklearn.ensemble.GradientBoostingClassifier
pickle - Python object serialization
