This repository contains the source code and documentation for the final project of INDENG 242A: Machine Learning and Data Analytics I. The project focuses on predicting song popularity on Spotify using audio features and supervised machine learning techniques.
🔗 Live Demo: Launch Dashboard
The goal of this project is to determine whether a song will be "successful" (defined as having a popularity score above the median) based on its acoustic characteristics such as danceability, energy, and valence.
We implemented a Random Forest Classifier which demonstrated superior performance in modeling non-linear relationships compared to logistic regression baselines.
We developed an interactive web application using Streamlit to visualize our findings and deploy the model:
- Exploratory Data Analysis (EDA): Interactive correlation heatmaps and box plots comparing successful vs. unsuccessful songs.
- Model Performance: Visualization of feature importance and performance metrics (Accuracy, AUC).
- Prediction Playground: An interactive interface allowing users to adjust audio feature sliders (e.g., Tempo, Loudness) to simulate a song and receive a real-time success prediction.
The dataset used in this project is sourced from Kaggle: Spotify Music Dataset by Solomon Ameen
To run the dashboard locally:
-
Clone this repository.
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the Streamlit app:
streamlit run spotify_dashboard.py
- Yijun Gu
- Rimsha Ijaz
- Yizhou Zheng
Created for INDENG 242A, Department of Industrial Engineering & Operations Research, UC Berkeley.