Tabular Data Science - Final Project - Kernel Density Estimation Bandwidth Optimization

Project for tabular data science course. By Liel Gutman and Yaniv Rotics.

About the Project

The project aims to improve the selection of hyperparameters in Kernel Density Estimation, specifically the bandwidth parameter which impacts the smoothness of the estimation. The proposed solution involves using different bandwidths including "scott" and "silverman" rules-of-thumb, and Grid Search Cross-Validation which is an AutoML method available in Scikit Learn that estimates the best hyperparameters. The performance was evaluated using mean squared error (MSE) and total log-likelihood metrics. The performance of the best methods was tested on known distributions and four different datasets as well.

The final report can be found here.

Datasets We Used

Avocado prices (kaggle)
Spotify Songs (kaggle)
Action movie ratings imdb (kaggle)
NASA - Nearest Earth Objects (kaggle)

To run the project, please install the requirements and run the jupyter notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
KDE.ipynb		KDE.ipynb
README.md		README.md
action_movie_ratings_imdb.csv		action_movie_ratings_imdb.csv
avocado_prices.csv		avocado_prices.csv
neo.csv		neo.csv
report.pdf		report.pdf
requirements.txt		requirements.txt
spotify_songs.csv		spotify_songs.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tabular Data Science - Final Project - Kernel Density Estimation Bandwidth Optimization

About the Project

Datasets We Used

About

Uh oh!

Releases

Packages

Languages

yaniv8360/Kernel-Density-Estimation

Folders and files

Latest commit

History

Repository files navigation

Tabular Data Science - Final Project - Kernel Density Estimation Bandwidth Optimization

About the Project

Datasets We Used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages