GitHub - Luigina2001/Depthtective: Official PyTorch implementation of "Depthtective: A Depth-Aware Framework for Spatio-Temporal Deepfake Detection" (Submitted to ICPR 2026).

Official implementation of the method described in:
“Depthtective: A Depth-Aware Framework for Spatio-Temporal Deepfake Detection”

Warning

Code Release Status: This paper is currently under review. The full source code and pre-trained models will be released publicly after the first review notification.

The documentation below serves as a preview of the framework's usage.

Overview

Depthtective is a data-efficient framework for the detection of manipulated facial videos based on the analysis of spatio-temporal inconsistencies in estimated depth. The method draws on the observation that modern deepfake generation techniques, while photorealistic, exhibit subtle violations of geometric coherence that become evident when comparing depth estimates between temporally adjacent frames.

Instead of relying on heavy temporal models such as 3D CNNs or Transformers, Depthtective focuses on the temporal residuals between two consecutive frames. The absolute differences in both RGB and depth domains are fused into a compact four-channel tensor that exposes motion-related inconsistencies and geometric distortions introduced by manipulation. This representation enables accurate video-level classification without the need for extended temporal sequences.

Method

Residual Representation

For each pair of aligned frames, a depth map is estimated through MiDaS (DPT-Large).
The temporal variation in appearance and geometry is quantified through the absolute inter-frame residuals in RGB and depth. Their fusion forms a four-channel tensor (RGBD residual) that serves as the sole input to the classifier.

Classification Pipeline

The residual tensor is processed by an adapted Xception or ResNet50 architecture supporting four-channel input while retaining ImageNet pretraining. The network is fine-tuned to discriminate between authentic and manipulated videos using a standard binary classification objective.
Despite its simplicity, this formulation captures the core temporal inconsistencies typical of deepfake generation.

Contrastive Variant

A second formulation adopts a contrastive representation learning approach.
The CNN is trained using a Triplet Loss to produce embeddings in which real and fake samples occupy well-separated regions of the latent space. A lightweight MLP head is then trained on top of the frozen encoder.
This strategy enhances separability especially for challenging manipulations such as NeuralTextures, where the artifacts are subtle and stochastic.

Performance Highlights

The effectiveness of Depthtective has been validated through experiments on the FaceForensics++ (FF++) benchmark (C23 compression) and the Celeb-DF (v2) dataset. We report the performance of our method implemented with standard CNN backbones (Xception, ResNet50) and the Contrastive Learning variant. The radar charts below illustrate the Accuracy, F1-Score, and Area Under the Curve (AUC) across all manipulation types.

Xception

ResNet50

Contrastive Learning

Installation (Preview)

git clone https://github.com/Luigina2001/Depthtective.git
cd Depthtective

Using Conda:

conda env create -f environment.yml
conda activate Depthtective

Using pip:

pip install -r requirements.txt

Usage (Preview)

Depthtective provides a unified script for classifying a video. The script performs frame extraction, depth estimation, residual construction, and prediction.

python main.py \
    --video_path path/to/video.mp4 \
    --contrastive_encoder_path models/best_contrastive_model.pth \
    --classifier_head_path models/best_classifier_head.pth \
    --hidden_features 256

Example output:

Video: test_video.mp4
Prediction: Deepfake
Confidence: 98.45%

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Method

Residual Representation

Classification Pipeline

Contrastive Variant

Performance Highlights

Installation (Preview)

Usage (Preview)

About

Uh oh!

Releases

Packages

Luigina2001/Depthtective

Folders and files

Latest commit

History

Repository files navigation

Overview

Method

Residual Representation

Classification Pipeline

Contrastive Variant

Performance Highlights

Installation (Preview)

Usage (Preview)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages