What makes a hit pop song? This paper attempts to identify the audio features that characterize mainstream music's biggest hits. This self-directed study utilizes audio feature data from top artists' discographies and scrapes them through the Spotify API. They are then analyzed through visualization and through multiple linear regression to find which specific musical qualities significantly impact a song's assigned popularity score on Spotify.
The repo is structured as:
data/raw_datacontains the raw data as obtained from the Spotify API using spotifyr as .csv files.data/analysis_datacontains the cleaned dataset that was constructed saved as a .parquet file.modelcontains fitted models.othercontains a datasheet, details of LLM chat interactions, and rough sketches.papercontains the files used to generate the paper, including the Quarto document and reference bibliography file, as well as the PDF of the paper.scriptscontains the R scripts used to simulate, download, clean, test, and model the data.
My workflow is one based, in part, on an open-source data science workflow devised by the legendary Rohan Alexander.
Here is a brief guide to reproducing my graphs and tables.
- Clone this repository to your computer.
- Install RStudio (recommended), copy this repository to Posit Cloud (meh), or any other R language interpreter (not recommended). Install the libaries indicated in the
setupchunk at the top ofpaper\paper.qmd. - In
scripts, run each of the files to get a sense of how I simulated, downloaded, cleaned, modeled, and tested my data. - Navigate to any of the R chunks in
paper.qmdto run the code as I did.
Aspects of my R code and paper were written and edited with the assistance of Large Language Models, in particular variants of Claude-3 (Claude.ai) by Anthropic and GPT-4 (ChatGPT) by OpenAI.
Claude-3 Sonnet/Haiku was used for:
- Writing and editing parts of the paper
- Some debugging
GPT-4 was used for:
- Coding some of the R graphs
- Debugging and troubleshooting
The complete chat history with both models are available in other/llms.