Skip to content

anushkamohan18/Shazam_Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Shazam Clone – Song Identification with Machine Learning

Python TensorFlow Status

A high-accuracy machine learning system that identifies songs from short audio clips using deep learning and vector similarity search.


Overview

This project demonstrates how to build a music identification system using deep learning and similarity search without relying on traditional audio fingerprinting databases.

By combining YAMNet audio embeddings, FAISS vector search, and an ensemble voting strategy, the system achieves 93 percent accuracy on real-world audio clips.

Key achievement: Identifies songs in approximately 10 milliseconds using only 8 seconds of audio.


Features

  • 93 percent identification accuracy on thousands of songs
  • Average query latency of approximately 10 milliseconds
  • Robust to noise, compression, and partial audio clips
  • Uses YAMNet for audio embeddings
  • Uses FAISS for efficient similarity search
  • Voting ensemble across multiple clips per song
  • Scalable architecture suitable for large catalogs
  • Pre-trained model included

Dataset


Performance Metrics

Metric Value
Dataset size 5,564 songs
Accuracy 93 percent
Query speed ~10 ms
Minimum audio length 8 seconds
Embedding dimension 521
Index type FAISS IndexFlatL2
Model size ~50 MB

Accuracy breakdown:

  • Single clip (8s): 70–80 percent
  • Voting method (3 clips): 93 percent
  • Noisy audio supported
  • Compressed audio supported

Tech Stack

Core technologies:

  • YAMNet (Google audio embedding model)
  • FAISS (vector similarity search)
  • TensorFlow
  • Librosa
  • NumPy

Development environment:

  • Python 3.8+
  • Jupyter Notebook
  • Google Colab for training

Architecture

Audio input (8-second clip)

Librosa load and resample (16 kHz)

YAMNet embedding extraction (521-dimensional vectors)

FAISS similarity search (L2 distance)

Top-K matching tracks with similarity scores


How It Works

  1. Audio clips are loaded and resampled to 16 kHz.
  2. YAMNet extracts frame-level audio embeddings.
  3. Embeddings are averaged to form a fixed-length vector.
  4. FAISS performs nearest-neighbor search using L2 distance.
  5. A voting strategy across multiple clips improves robustness.

Voting Method

For each song, embeddings are extracted from:

  1. Start of the song (0–8 seconds)
  2. Middle segment
  3. End segment

The final representation is the average of all embeddings, improving robustness and accuracy.


Installation

git clone https://github.com/yourusername/shazam-clone.git
cd shazam-clone
pip install -r requirements.txt

About

This project implements a machine learning–based music identification system using deep audio embeddings and vector similarity search. By combining YAMNet embeddings with FAISS and an ensemble voting strategy, it identifies songs from short audio clips efficiently and robustly, without relying on traditional audio fingerprinting methods.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors