On Explainable Closed-Set Source Device Identification Using log-Mel Spectrograms from Videos' Audio: A Grad-CAM Approach

This repository implements an approach for source device identification using log-Mel spectrograms extracted from video audio tracks. The project employs Grad-CAM (Gradient-weighted Class Activation Mapping) to provide visual explanations for device classification decisions.

Overview

Source device identification is a crucial task in digital forensics that aims to determine the originating device of multimedia content. This project focuses on identifying the source device from video recordings using audio characteristics, specifically through the analysis of log-Mel spectrograms with deep learning models and explainable AI techniques.

Dataset

VISION Dataset

This project utilizes the VISION dataset, which contains video recordings from 35 different mobile devices captured under various conditions.

Dataset Details:

35 mobile devices (smartphones and tablets)
Multiple recording scenarios: flat surface, indoor handheld, outdoor handheld
Various content sources: original recordings, YouTube downloads, WhatsApp transfers
Audio sampling rate: 44.1 kHz
Device labels: D01 through D35

Reference Paper:

Shullani, D., Fontani, M., Iuliani, M. et al. VISION: a video and image dataset for source identification. EURASIP Journal on Information Security 2017, 15 (2017). https://doi.org/10.1186/s13635-017-0067-2

Dataset Download

You can download the VISION dataset from here using the script dataset/VISION/downloadVISION.py.

# Download the dataset (script should be provided separately)
python dataset/VISION/downloadVISION.py

Repository Structure

├── create_image_dataset.py          # Image patch extraction from videos
├── create_spectrogram_dataset.py    # Spectrogram dataset creation
├── create_spectrogram_dataset_merged.py  # Merged dataset processing
├── train_test_model.py              # Main training script
├── train_test_model_bandpass.py     # Training with bandpass filtering
├── train_test_model_merged.py       # Training on merged dataset
├── VISION_mel.py                    # Mel spectrogram extraction
├── VISION_mel_band.py              # Bandpass filtered Mel spectrograms
├── requirements.txt                 # Python dependencies
└── README.md                       # This file

Installation

Clone the repository:

git clone https://github.com/ckorgial/SDI-ResNet.git
cd vision-device-identification

Create a virtual environment:

python -m venv vision_env
source vision_env/bin/activate  # On Windows: vision_env\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Usage

1. Audio Preprocessing

Extract log-Mel spectrograms from video audio:

# Standard Mel spectrograms
python VISION_mel.py

# Bandpass filtered Mel spectrograms (8-12 kHz)
python VISION_mel_band.py

2. Dataset Creation

Create training datasets from extracted spectrograms:

# Create spectrogram patches for training
python create_spectrogram_dataset.py

# Create merged dataset (combining similar devices)
python create_spectrogram_dataset_merged.py

3. Model Training

Train the ResNet-50 model for device identification:

# Standard training
python train_test_model.py

# Training with bandpass filtering
python train_test_model_bandpass.py

# Training on merged dataset
python train_test_model_merged.py

Methodology

Audio Feature Extraction

Audio Extraction: Extract audio tracks from video files at 44.1 kHz sampling rate
Mel Spectrogram Generation: Convert audio to log-Mel spectrograms using:
- FFT size: 2048
- Hop length: 512
- 128 Mel filter banks
Optional Bandpass Filtering: Apply 8-12 kHz bandpass filter to focus on device-specific characteristics

Contact

For questions or issues, please open an issue on GitHub or contact [ckorgial@csd.auth.gr].

Acknowledgments

Thanks to the authors of the VISION dataset for providing this valuable resource

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

On Explainable Closed-Set Source Device Identification Using log-Mel Spectrograms from Videos' Audio: A Grad-CAM Approach

Overview

Dataset

VISION Dataset

Dataset Download

Repository Structure

Installation

Usage

1. Audio Preprocessing

2. Dataset Creation

3. Model Training

Methodology

Audio Feature Extraction

Contact

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
dataset/VISION		dataset/VISION
LICENSE		LICENSE
README.md		README.md
VISION_mel.py		VISION_mel.py
VISION_mel_band.py		VISION_mel_band.py
create_image_dataset.py		create_image_dataset.py
create_spectrogram_dataset.py		create_spectrogram_dataset.py
create_spectrogram_dataset_merged.py		create_spectrogram_dataset_merged.py
requirements.txt		requirements.txt
train_test_model.py		train_test_model.py
train_test_model_bandpass.py		train_test_model_bandpass.py
train_test_model_merged.py		train_test_model_merged.py

License

ckorgial/SDI-ResNet

Folders and files

Latest commit

History

Repository files navigation

On Explainable Closed-Set Source Device Identification Using log-Mel Spectrograms from Videos' Audio: A Grad-CAM Approach

Overview

Dataset

VISION Dataset

Dataset Download

Repository Structure

Installation

Usage

1. Audio Preprocessing

2. Dataset Creation

3. Model Training

Methodology

Audio Feature Extraction

Contact

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages