Musical key detection with Deep Learning

Quick use

Load the latest release min_files.zip OR clone the repository and load the ONNX model attachment
Use export_tensor.py to transform your .wav file into a JSON-encoded 2d spectogram.

python .\export_tensor.py AUDIOFILE TARGETFILE

Run a localhost server in the root directory of the project, e.g. for port 8080

npx light-server -s . -p 8080

Open localhost:PORT in your browser of choice (activate JavaScript)
Load the exported tensor file and submit

Papers

https://paperswithcode.com/paper/deeper-convolutional-neural-networks-and
https://paperswithcode.com/paper/musical-tempo-and-key-estimation-using
https://arxiv.org/pdf/1706.02921.pdf

Topic

Musical feature extraction

Project type

Bring your own model

Summary

My main focus for this project will be the topic of key detection in musical pieces. I hope to reuse existing approaches and further refine them to improve on their performance. Steps are planned in the following order:

Rebuild an existing solution as quoted above
Experiment with network architectures, types of feature extraction, and applications to the waveform itself.
Expand the currently existing collection of key detection data sources with simple self-made compositions. Using modern DAWs it should be somewhat trivial to construct a series of short audio samples in various keys using different instruments setups.

I plan to use the following datasets:

GiantSteps & GiantSteps MTG
These seems to be common datasets to use for key extraction and also provide us with some comparable approaches from other models
Children's Songs
This is a set of vocal recordings only
Optionally, my own dataset

My main focus will be on model generation. However, if I can reach satisfying results before my estimated time is used up, I will invest the remaining time into dataset creation. Hence the following breakdown is somewhat flexible:

Dataset collection: 2-8 hrs
Design/build network: 9-15 hrs
Train/tune network: 18 hrs
Build application: 8 hrs
Write report: 6 hrs
Presentation: 6 hrs Total: 55 hrs

Phase 2 - Hacking report

Plan

My references paper used accuracy ratings (micro-averaged from my understanding) as well as the Mirex score. I am using both scores to be comparable.

The state of the art in Mirex score is around 75. My aim was to reach at least 70.

For the implementation I went for stripped-down version of InceptionKeyNet. I implemented some of the blocks, but stopped noticing performance increases after a while, and in fact it seemed that the network decreased in accuracy. Personally I think the full network is overkill for key detection only, which likely depends on a few base frequencies for the most part. My next steps will include experimentation with more simple models.

Installation

All code and notes can be found in the Jupyter notebook. Please install the dependencies outlined in requirements.txt. The audio files must be downloaded using the repository links above. The project is configured for Giantsteps and Giantsteps MTG and allows setup of data locations within the notebook. It also includes a conversion script from mp3 to wav. I neeeded to load the files in wav format on my windows machine, or else a significant chunk would not load

Results

The final Mirex best score for my network is currently around 60 across various recomputed train-test splits using the optimal configuration (should be set up at time of submissions), meaning I am sadly behind my established target at the moment.

Rough time investment:

Dataset collection: 3 hrs
Design/build network: 30 hrs
Train/tune network: 20 hrs

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
model		model
util		util
12002294_key_detection.ipynb		12002294_key_detection.ipynb
README.md		README.md
export_tensor.py		export_tensor.py
index.html		index.html
index.js		index.js
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Musical key detection with Deep Learning

Quick use

Papers

Topic

Project type

Summary

Phase 2 - Hacking report

Plan

Installation

Results

About

Uh oh!

Releases 1

Packages

Languages

Entenzahn/adl_ws22

Folders and files

Latest commit

History

Repository files navigation

Musical key detection with Deep Learning

Quick use

Papers

Topic

Project type

Summary

Phase 2 - Hacking report

Plan

Installation

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages