song-cluster is the companion repository to my undergraduate research paper, Do Machines Think Equally About Music?.
This project investigates how well a Support Vector Machine (SVM) classifier trained on human-made music can classify songs generated by a machine learning model, Meta’s MuseGen (via the MusicGPT interface).
The repository is organized to allow complete and exact reproduction of the study, including:
- Downloading and cleaning raw music data,
- Training several machine learning classifiers (SVM, KNN, Random Forest, XGBoost) on audio feature datasets,
- Prompting a small MuseGen model to generate genre-specific music clips,
- Extracting features from machine-generated music,
- Testing the ability of a classifier to predict the genre of AI-generated songs,
- Producing the final paper.
The study highlights the limitations of simplistic machine learning approaches to both music generation and classification, suggesting directions for future research in music information retrieval and generative AI.
- The SVM model trained on human music failed to correctly classify any machine-generated song into the correct genre.
- Results seem to suggest a difference in how different machines conceptualize patterns of music genre.
- Larger models or more sophisticated prompts may improve future genre fidelity in music generation.
See the full discussion and analysis in the final paper: paper/paper.pdf.
├── data/
│ ├── raw_data/ # Compressed raw song data from FMA and MuseGen-generated audio (.wav files)
│ ├── analysis_data/ # Cleaned and processed datasets ready for modeling
├── models.zip # Pretrained clustering models (unzip before use)
├── other/ # Music generation prompts and sketches
├── paper/
│ ├── paper.qmd # Quarto source file for the paper
│ ├── references.bib # Reference file (BibTeX)
│ ├── paper.pdf # Final PDF of the paper
├── scripts/ # Python scripts for data processing, model training, feature extraction
├── README.md # You're here
Install the following:
- Python 3.9+
- Modules:
pandasnumpyscikit-learnxgboostmatplotlibseabornlibrosazipfileosglobpathlibquarto(optional, for paper compilation)
You can install the needed Python packages with:
pip install -r requirements.txt(Note: You may need to manually install librosa if not included.)
Follow these steps to fully reproduce the paper’s results:
-
Clone the Repository
git clone https://github.com/lcarnegie/song-classification.git cd song-classification -
Prepare the Data
- Unzip the
models.zipfile to access pretrained models. - The
data/raw_data/folder contains compressed audio and extracted feature files.
- Unzip the
-
Run Data Processing Scripts
In
scripts/, run:01-clean-fma-data.ipynb, 02-test-data.ipynb, and 03-explore-data.ipynbto clean the data, test and validate it, and explore it like I did in the paper
-
Train Classifiers (Optional)
If you want to retrain models rather than using the provided ones, run:04.0-train-rf.ipynb 04.1-train-xgboost.ipynb 04.2-train-svm.ipynb 04.3-train-kmn.ipynb -
Extract the Music Files If you want to re-extract the feature from the generated MuseGen Model Audio output, run:
05-extract-audio-features.ipynb -
Classify Generated Music
After generating feature sets for the machine-generated songs:06-predict.ipynb
-
Build the Paper (Optional)
If you'd like to regenerate the paper PDF:quarto render paper/paper.qmd
Aspects of the code and text were written with the assistance of GitHub Copilot and ChatGPT to improve efficiency.
All original ideas, experiment designs, and analyses are my own.
If you use this work, please cite:
Carnegie, Luca. Do Machines Think Equally About Music? April 2025. https://github.com/lcarnegie/song-classification