song-cluster

Overview

song-cluster is the companion repository to my undergraduate research paper, Do Machines Think Equally About Music?.

This project investigates how well a Support Vector Machine (SVM) classifier trained on human-made music can classify songs generated by a machine learning model, Meta’s MuseGen (via the MusicGPT interface).

The repository is organized to allow complete and exact reproduction of the study, including:

Downloading and cleaning raw music data,
Training several machine learning classifiers (SVM, KNN, Random Forest, XGBoost) on audio feature datasets,
Prompting a small MuseGen model to generate genre-specific music clips,
Extracting features from machine-generated music,
Testing the ability of a classifier to predict the genre of AI-generated songs,
Producing the final paper.

The study highlights the limitations of simplistic machine learning approaches to both music generation and classification, suggesting directions for future research in music information retrieval and generative AI.

Main Findings

The SVM model trained on human music failed to correctly classify any machine-generated song into the correct genre.
Results seem to suggest a difference in how different machines conceptualize patterns of music genre.
Larger models or more sophisticated prompts may improve future genre fidelity in music generation.

See the full discussion and analysis in the final paper: paper/paper.pdf.

Repository Structure

├── data/
│   ├── raw_data/           # Compressed raw song data from FMA and MuseGen-generated audio (.wav files)
│   ├── analysis_data/      # Cleaned and processed datasets ready for modeling
├── models.zip              # Pretrained clustering models (unzip before use)
├── other/                  # Music generation prompts and sketches
├── paper/
│   ├── paper.qmd           # Quarto source file for the paper
│   ├── references.bib      # Reference file (BibTeX)
│   ├── paper.pdf           # Final PDF of the paper
├── scripts/                # Python scripts for data processing, model training, feature extraction
├── README.md               # You're here

Requirements

Install the following:

Python 3.9+
Modules:
- pandas
- numpy
- scikit-learn
- xgboost
- matplotlib
- seaborn
- librosa
- zipfile
- os
- glob
- pathlib
- quarto (optional, for paper compilation)

You can install the needed Python packages with:

pip install -r requirements.txt

(Note: You may need to manually install librosa if not included.)

Reproduction Steps

Follow these steps to fully reproduce the paper’s results:

Clone the Repository

git clone https://github.com/lcarnegie/song-classification.git
cd song-classification

Prepare the Data
- Unzip the models.zip file to access pretrained models.
- The data/raw_data/ folder contains compressed audio and extracted feature files.
Run Data Processing Scripts

In scripts/, run:
```
01-clean-fma-data.ipynb,
02-test-data.ipynb, and
03-explore-data.ipynb
```
to clean the data, test and validate it, and explore it like I did in the paper
Train Classifiers (Optional)
If you want to retrain models rather than using the provided ones, run:
```
04.0-train-rf.ipynb
04.1-train-xgboost.ipynb
04.2-train-svm.ipynb
04.3-train-kmn.ipynb
```
Extract the Music Files If you want to re-extract the feature from the generated MuseGen Model Audio output, run:
```
05-extract-audio-features.ipynb
```
Classify Generated Music
After generating feature sets for the machine-generated songs:
```
06-predict.ipynb
```
Build the Paper (Optional)
If you'd like to regenerate the paper PDF:
```
quarto render paper/paper.qmd
```

Statement on LLM Usage

Aspects of the code and text were written with the assistance of GitHub Copilot and ChatGPT to improve efficiency.
All original ideas, experiment designs, and analyses are my own.

Citation

If you use this work, please cite:

Carnegie, Luca. Do Machines Think Equally About Music? April 2025. https://github.com/lcarnegie/song-classification

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

song-cluster

Overview

Main Findings

Repository Structure

Requirements

Reproduction Steps

Statement on LLM Usage

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
data		data
other		other
paper		paper
scripts		scripts
.gitattributes		.gitattributes
README.md		README.md
models.zip		models.zip
requirements.txt		requirements.txt

lcarnegie/song-classification

Folders and files

Latest commit

History

Repository files navigation

song-cluster

Overview

Main Findings

Repository Structure

Requirements

Reproduction Steps

Statement on LLM Usage

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages