Skip to content

Latest commit

 

History

History
28 lines (20 loc) · 1.27 KB

File metadata and controls

28 lines (20 loc) · 1.27 KB

IronKaggle

One day competition of Ironhack's Data Analytics bootcamp. Goal was to build a predicting model for sales that then was to be verified. Cleaned a raw dataset with data of sales from different stores, used feature engineering for feature selection and then applied two diferente models and compared the scores on both: xgboost and Random Forest Regressor. Weighted the bias / variance to decide on which to choose: chose the second.

Model later verified by the teacher on a new dataset and ended being the winner.


Technical Requirements

  • Data Cleaning and Manipulation: checking and dropping null values / rows / columns, dealing with duplicates, formatting and filtering data;
  • Combining and Structuring Data:
  • Data Aggregation and Filtering;
  • Libraries imported:
    • Pandas: import, export the shark_attack.csv - baseline for the project - and manipulate data;
    • matplotlib: plotting histograms to verify hypothesis;
    • Numpy;
    • Seaborn;
    • sklearn: metrics, ensemble and model_selection.

Resources