Course project for the course "INSE 6220 - Advanced Statistical Approaches to Quality", taken at Concordia University during the Winter 2024 term.
This project aims to perform and compare simple Machine Learning algorithms and binary classification models on the occupancy estimation task for smart buildings. To this end, we employ the dataset from here which consist of various sensors records captured within an office space at the Grenoble Institute of Technology. This dataset is available upon request.
This repository is organized as follows:
-
dataset/sql2csv.ipynbload the raw SQL dataset and computes a new dataset to be saved indata/processed. This involves, discarding irrelevant features, resampling observations using a larger sampling rate as they are not recorded for the same timestamps and some additional simple pre-processing. -
dataset/EDA.ipynbprovides an exploratory data analysis and is used to generate the plots displayed in the report. -
main.pywill run PCA (PCA.py), train the ML classifiers (models are defined inmodels/), perform hyperparameter optimization through grid-search, evaluate the model on the test data and save results to an MLflow server (to be started withmlflow ui -p 8080). -
run_experiments.pywill run and log multiple experiments for different models and parameters.
To replicate our results, run the following:
python -m venv venv
source venv/Scripts/activate
pip install -r requirements.txt
python run_experiments.py
python -m mlflow ui -p 8080