Skip to content

yaniv8360/Kernel-Density-Estimation

 
 

Repository files navigation

Tabular Data Science - Final Project - Kernel Density Estimation Bandwidth Optimization

Project for tabular data science course. By Liel Gutman and Yaniv Rotics.

About the Project

The project aims to improve the selection of hyperparameters in Kernel Density Estimation, specifically the bandwidth parameter which impacts the smoothness of the estimation. The proposed solution involves using different bandwidths including "scott" and "silverman" rules-of-thumb, and Grid Search Cross-Validation which is an AutoML method available in Scikit Learn that estimates the best hyperparameters. The performance was evaluated using mean squared error (MSE) and total log-likelihood metrics. The performance of the best methods was tested on known distributions and four different datasets as well.

The final report can be found here.

Datasets We Used

  1. Avocado prices (kaggle)
  2. Spotify Songs (kaggle)
  3. Action movie ratings imdb (kaggle)
  4. NASA - Nearest Earth Objects (kaggle)

To run the project, please install the requirements and run the jupyter notebook.

About

Project for tabular data science course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%