Skip to content

TristanBandat/genre-prediction-video

Repository files navigation

Contributors Forks Issues Pull Requests License closed Pull Requests closed Issues

Predicting the genre of an artist/a band based on their music video clip

This project tries to predict the music genre of a given music video clip.
Explore the docs »

Report Bug · Request Feature

Table of Contents

  1. About The Project
  2. Getting Started
  3. Dataset Versions
  4. Models
  5. Roadmap
  6. Contributing
  7. License
  8. Contact

About The Project

The goal of this project is to predict the music genre of the vectorized music videos of the Music4AllOnion dataset. After approaches like k-Nearest Neighbor or a simple Neural Network in order to verify the correct usage of the dataset.

The main idea to predict the music genre is to make a multi-label prediction with Transfer Learning using a ResNet50 model. To fit the input shape, the vectors will be reshaped to tensors with (64, 64, 3) values.

This project is based on the paper Moscati, Marta & Deldjoo, Yashar & Schedl, Markus & Parada-Cabaleiro, Emilia & Zangerle, Eva. (2022). Music4All-Onion — A Large-Scale Multi-faceted Content-Centric Music Recommendation Dataset.

Built With

Getting Started

To get a local copy up and running follow these simple steps.

Installation

  1. Python Env Setup

    1. Windows

      1. Install Anaconda

      2. Open Anaconda Prompt and type:

        conda update -n base -c defaults conda
        conda create --name Python3.10 python=3.10
        conda activate Python3.10
        conda install pandas matplotlib numpy
        pip install tensorflow-datasets
        python -m pip install "tensorflow<2.11"
      3. If a GPU is available, it should be listed with the following command:

        python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
    2. Ubuntu

      1. Install Anaconda like stated here

      2. Open terminal and type:

        conda create --name Python3.10 python=3.10
        conda activate Python3.10
        conda install pandas matplotlib numpy
        conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
        export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/
        python3 -m pip install tensorflow tensorflow-datasets
      3. If a GPU is available, it should be listed with the following command:

        python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
  2. Clone the repo

    git clone https://github.com/TristanBandat/genre-prediction-video.git
  3. Download the data files:
    Note: The files need to be placed in a folder called data/.

  4. Create dataset

    cd datasets/Music4AllOnionDC/
    tfds build Music4AllOnionDC.py --data_dir [CWD]/data/

Dataset versions

  1. Version 1.0.1
    INCP vectors with shape (4096,) and labels with shape (685,).

  2. Version 2.0.0
    ResNet vectors with shape (4096,) and labels with shape (685,).

  3. Version 3.0.0
    VGG19 vectors with shape (8192,) and labels with shape (685,).

  4. Version 3.0.1
    VGG19 vectors with shape (4096,) and labels with shape (685,). The compression was achieved by taking the mean of 2 mean values and the maximum for 2 max values each for each data point.

  5. Version 3.0.2
    VGG19 vectors with shape (64, 64, 3) and labels with shape (685,). The datapoints of version 3.0.1 were reshaped to (64, 64) and then repeated 3 times to fit the ResNet50 input shape.

Models

  • k-Nearest Neighbor

    We tried different values but k=3 gave the best result.

    • Test accuracy (INCP vectors): 12.97%

    • Test accuracy (ResNet vectors): 13.26%

    • Test accuracy (VGG19 vectors): 12.49%

  • Decision Tree

    Because of the huge computational power needed for this model,

    the test set was used for both training and testing.

    • Test accuracy (INCP vectors): 6.91%

    • Test accuracy (ResNet vectors): 6.52%

    • Test accuracy (VGG19 vectors): 7.95%

  • Simple Neural Network

    Very simple NN with one big hidden layer.

    • Test accuracy (INCP vectors): 17.07%

    • Test accuracy (ResNet vectors): 16.09%

    • Test accuracy (VGG19 vectors): 14.49%
  • Deep Neural Network

    Same as the simple NN but with 20 smaller hidden layers.

    • Test accuracy (INCP vectors): 7.35%
    • Test accuracy (ResNet vectors): 9.14%

    • Test accuracy (VGG19 vectors): 7.35%

  • LSTM

    A Model with 2 LSTM and 2 dense hidden layers.

    • Test accuracy (INCP vectors): 7.35%

    • Test accuracy (ResNet vectors): 7.35%

    • Test accuracy (VGG19 vectors): 7.35%

  • ResNet50 with Transfer Learning

    The ResNet50 with an additional output layer to fit the output shape.

    • Test accuracy (VGG19 vectors): 10.47%

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create.
Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Note: The Project was done as part of a AI Bachelor course and it may not be maintained very well!

License

Distributed under the GPL-3.0 License. See LICENSE for more information.

Contact

Tristan Bandat - @TBandat - tristan.bandat@gmail.com
Philipp Meingaßner - p.meingassner@gmail.com

Project Link: https://github.com/TristanBandat/genre-prediction-video

About

Predicting the genre of an artist/a band based on their music video clip

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors