Skip to content

cariscorentin/Hackathon-Eleven-Strategy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Hackathon-Eleven-Strategy

Introduction

This README file provides an in-depth explanation of the code for the hackathon project. The project aims to predict waiting times at a certain venue by utilizing historical waiting time data, weather data, and additional temporal features. The prediction model is based on the XGBoost algorithm.

Project Structure

The project comprises several components:

  1. Python Code: The main code file contains the Python code for data preprocessing, feature engineering, model training, and prediction generation.
  2. Data Files: The project uses several CSV files:
    • waiting_times_train.csv: Contains historical waiting time data for training.
    • waiting_times_X_test_final.csv: Contains waiting time data for generating predictions.
    • weather_data.csv: Contains historical weather data.
  3. Output Files: After running the code, the following output files are generated:
    • predictions_final.csv: Contains the predictions generated by the model.

Code Explanation

Data Preprocessing and Feature Engineering

  1. Loading Data: The code loads the training waiting time data (waiting_times_train.csv), testing waiting time data (waiting_times_X_test_final.csv), and weather data (weather_data.csv) using Pandas.
  2. Feature Engineering: The add_time_features function adds various temporal features to the datasets, such as day of week, month, hour, season, etc. This function is applied to all three datasets.
  3. Merging Data: The code merges the waiting time data with the weather data based on the DATETIME column.
  4. Column Cleanup: After merging, columns with the suffix _drop are removed from the datasets.

Model Training and Prediction

  1. Data Preparation: The code prepares the training and testing datasets by selecting relevant features and handling missing values.
  2. Model Definition: XGBoost Regressor model is defined with hyperparameters optimized for the task.
  3. Pipeline Creation: A pipeline is created to streamline the data preprocessing and modeling steps.
  4. Model Training: The pipeline is fitted to the training data.
  5. Prediction Generation: The model generates predictions for the testing data.
  6. Output Generation: Predictions are saved to a CSV file (predictions_final.csv) along with relevant information such as DATETIME and ENTITY_DESCRIPTION_SHORT.

Dependencies

The project requires the following Python libraries:

  • Pandas
  • NumPy
  • XGBoost
  • Scikit-learn

Instructions for Running the Code

  1. Ensure all required data files (waiting_times_train.csv, waiting_times_X_test_final.csv, weather_data.csv) are placed in the same directory as the code file.
  2. Install the necessary Python dependencies.
  3. Run the script.
  4. After execution, check the predictions_final.csv file for the generated predictions.

Notes

  • The code is optimized for prediction accuracy using XGBoost regressor with carefully tuned hyperparameters.
  • Additional optimization or modification may be required based on specific requirements or new data.
  • For any inquiries or issues, please contact the project contributors.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors