Skip to content

ckorgial/SDI-ResNet

Repository files navigation

On Explainable Closed-Set Source Device Identification Using log-Mel Spectrograms from Videos' Audio: A Grad-CAM Approach

This repository implements an approach for source device identification using log-Mel spectrograms extracted from video audio tracks. The project employs Grad-CAM (Gradient-weighted Class Activation Mapping) to provide visual explanations for device classification decisions.

Overview

Source device identification is a crucial task in digital forensics that aims to determine the originating device of multimedia content. This project focuses on identifying the source device from video recordings using audio characteristics, specifically through the analysis of log-Mel spectrograms with deep learning models and explainable AI techniques.

Dataset

VISION Dataset

This project utilizes the VISION dataset, which contains video recordings from 35 different mobile devices captured under various conditions.

Dataset Details:

  • 35 mobile devices (smartphones and tablets)
  • Multiple recording scenarios: flat surface, indoor handheld, outdoor handheld
  • Various content sources: original recordings, YouTube downloads, WhatsApp transfers
  • Audio sampling rate: 44.1 kHz
  • Device labels: D01 through D35

Reference Paper:

Shullani, D., Fontani, M., Iuliani, M. et al. VISION: a video and image dataset for source identification. EURASIP Journal on Information Security 2017, 15 (2017). https://doi.org/10.1186/s13635-017-0067-2

Dataset Download

You can download the VISION dataset from here using the script dataset/VISION/downloadVISION.py.

# Download the dataset (script should be provided separately)
python dataset/VISION/downloadVISION.py

Repository Structure

├── create_image_dataset.py          # Image patch extraction from videos
├── create_spectrogram_dataset.py    # Spectrogram dataset creation
├── create_spectrogram_dataset_merged.py  # Merged dataset processing
├── train_test_model.py              # Main training script
├── train_test_model_bandpass.py     # Training with bandpass filtering
├── train_test_model_merged.py       # Training on merged dataset
├── VISION_mel.py                    # Mel spectrogram extraction
├── VISION_mel_band.py              # Bandpass filtered Mel spectrograms
├── requirements.txt                 # Python dependencies
└── README.md                       # This file

Installation

  1. Clone the repository:
git clone https://github.com/ckorgial/SDI-ResNet.git
cd vision-device-identification
  1. Create a virtual environment:
python -m venv vision_env
source vision_env/bin/activate  # On Windows: vision_env\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt

Usage

1. Audio Preprocessing

Extract log-Mel spectrograms from video audio:

# Standard Mel spectrograms
python VISION_mel.py

# Bandpass filtered Mel spectrograms (8-12 kHz)
python VISION_mel_band.py

2. Dataset Creation

Create training datasets from extracted spectrograms:

# Create spectrogram patches for training
python create_spectrogram_dataset.py

# Create merged dataset (combining similar devices)
python create_spectrogram_dataset_merged.py

3. Model Training

Train the ResNet-50 model for device identification:

# Standard training
python train_test_model.py

# Training with bandpass filtering
python train_test_model_bandpass.py

# Training on merged dataset
python train_test_model_merged.py

Methodology

Audio Feature Extraction

  1. Audio Extraction: Extract audio tracks from video files at 44.1 kHz sampling rate
  2. Mel Spectrogram Generation: Convert audio to log-Mel spectrograms using:
    • FFT size: 2048
    • Hop length: 512
    • 128 Mel filter banks
  3. Optional Bandpass Filtering: Apply 8-12 kHz bandpass filter to focus on device-specific characteristics

Contact

For questions or issues, please open an issue on GitHub or contact [ckorgial@csd.auth.gr].

Acknowledgments

  • Thanks to the authors of the VISION dataset for providing this valuable resource

About

Closed-Set SDI Using log-Mel Spectrograms from Videos (IEEE Access)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages