The Anatomy of a Hit: Statistically Learning from the Best

Overview

What makes a hit pop song? This paper attempts to identify the audio features that characterize mainstream music's biggest hits. This self-directed study utilizes audio feature data from top artists' discographies and scrapes them through the Spotify API. They are then analyzed through visualization and through multiple linear regression to find which specific musical qualities significantly impact a song's assigned popularity score on Spotify.

File Structure and Workflow

The repo is structured as:

data/raw_data contains the raw data as obtained from the Spotify API using spotifyr as .csv files.
data/analysis_data contains the cleaned dataset that was constructed saved as a .parquet file.
model contains fitted models.
other contains a datasheet, details of LLM chat interactions, and rough sketches.
paper contains the files used to generate the paper, including the Quarto document and reference bibliography file, as well as the PDF of the paper.
scripts contains the R scripts used to simulate, download, clean, test, and model the data.

My workflow is one based, in part, on an open-source data science workflow devised by the legendary Rohan Alexander.

Reproducing Graphs and Tables

Here is a brief guide to reproducing my graphs and tables.

Clone this repository to your computer.
Install RStudio (recommended), copy this repository to Posit Cloud (meh), or any other R language interpreter (not recommended). Install the libaries indicated in the setup chunk at the top of paper\paper.qmd.
In scripts, run each of the files to get a sense of how I simulated, downloaded, cleaned, modeled, and tested my data.
Navigate to any of the R chunks in paper.qmd to run the code as I did.

Statement on LLM usage

Aspects of my R code and paper were written and edited with the assistance of Large Language Models, in particular variants of Claude-3 (Claude.ai) by Anthropic and GPT-4 (ChatGPT) by OpenAI.

Claude-3 Sonnet/Haiku was used for:

Writing and editing parts of the paper
Some debugging

GPT-4 was used for:

Coding some of the R graphs
Debugging and troubleshooting

The complete chat history with both models are available in other/llms.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
data		data
models		models
other		other
paper		paper
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
popularity_analysis.Rproj		popularity_analysis.Rproj
sim_run_data_first_model_rstanarm.rds		sim_run_data_first_model_rstanarm.rds

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The Anatomy of a Hit: Statistically Learning from the Best

Overview

File Structure and Workflow

Reproducing Graphs and Tables

Statement on LLM usage

About

Uh oh!

Releases

Packages

Languages

lcarnegie/popularity-modeling

Folders and files

Latest commit

History

Repository files navigation

The Anatomy of a Hit: Statistically Learning from the Best

Overview

File Structure and Workflow

Reproducing Graphs and Tables

Statement on LLM usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages