Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

README.md

Ironhack logo

Lab | Machine Learning Pipelines

Introduction

In fact you have already been familiar with the complete ML pipelines by conducting the supervised and unsupervised learning labs in the past week. However, every dataset has different characteristics from others and it's important for you to practice the workflow with different types of datasets in order to remember the corresponding solutions in different contexts. In this lab, we will present you with a supervised learning problem for which linear regression analysis is not appropriate. We will show you why and how to analyze this kind of datasets by using other ML algorithms such as Random Forest.

Getting Started

Open the main.ipynb file in the your-code directory. Follow the instructions and add your code and explanations as necessary. At the end, in addition to completing the cells please also save your RF model as a pickle file.

Deliverables

  • main.ipynb with your responses.
  • mushroom.sav file of your RF model.

Submission

Upon completion, add your deliverables to git. Then commit git and push your branch to the remote.

Resources

Mushroom Data Set @UCI MLP

Mushroom Classification @Kaggle

Consequences of multicollinearity

Chi-Square Test of Independence

pandas.crosstab

scipy.stats.chi2_contingency

pandas.get_dummies

sklearn.model_selection.train_test_split

Random Forest

Bagging and Random Forests

Support Vector Machine

sklearn.ensemble.RandomForestClassifier

Confusion Matrix

sklearn.metrics.confusion_matrix

Gradient Boosting

sklearn.ensemble.GradientBoostingClassifier

pickle - Python object serialization

Analysis and /classification of Mushrooms

The Search for Categorical Correlation