🛡️ DeepShield — Real-Time Deepfake Detection System

A fully offline, explainable deepfake detection system built on EfficientNet-B0 with Grad-CAM visual explanations, a polished Streamlit UI, and a FastAPI backend for real-time inference.

📌 Overview

Deepfake technology uses generative AI to create highly realistic synthetic faces in images and videos. While powerful, it poses serious risks — misinformation, identity fraud, impersonation, and reputational harm.

DeepShield addresses this with a privacy-first, fully offline detection pipeline that:

Classifies images and videos as ✅ Real or 🚨 Fake
Provides confidence scores and P(Real) / P(Fake) probabilities
Explains decisions visually using Grad-CAM heatmaps
Runs entirely on your machine — no cloud calls, no data leaves your device

Drive link

https://drive.google.com/drive/folders/1kyWpFCtVWmF7qZGzEp4oOrDCnI8K6OAN?role=writer

🏗️ System Architecture

┌─────────────────────┐
│   Input (Image /    │
│   Video / Webcam)   │
└──────────┬──────────┘
           ↓
┌─────────────────────┐
│   Face Detection    │  ← OpenCV Haar Cascade (face_detector.py)
│   & Frame Sampling  │  ← Frame extractor (frame_extractor.py)
└──────────┬──────────┘
           ↓
┌─────────────────────┐
│  Image Preprocessing│  ← Resize 224×224, ImageNet normalize
└──────────┬──────────┘
           ↓
┌──────────────────────────────────────┐
│         DeepfakeCNN Model            │
│                                      │
│  EfficientNet-B0 (Spatial Branch)    │  → 1280-dim features
│  +                                   │
│  FrequencyBranch (FFT Spectrum)      │  → 128-dim features  [opt-in]
│                                      │
│  Fused → Linear head → Binary logit  │
└──────────┬───────────────────────────┘
           ↓
┌─────────────────────┐
│  Classification     │  Real / Fake + confidence score
└──────────┬──────────┘
           ↓
┌─────────────────────┐
│  Grad-CAM Module    │  Visual heatmap over suspicious regions
└─────────────────────┘

✨ Key Features

Feature	Details
EfficientNet-B0 backbone	ImageNet-pretrained, two-phase fine-tuning
Frequency-domain analysis	Optional FFT branch detects GAN grid artefacts
Face detection	OpenCV Haar cascade — crops to face before inference
Grad-CAM explanations	Heatmap overlay showing which regions drove the decision
Full video analysis	Samples N frames evenly, aggregates with majority vote + timeline chart
Live webcam	`streamlit-webrtc` in the UI + CLI realtime script
FastAPI backend	REST + WebSocket endpoints for image, video, and frame streaming
Fully offline	No internet connection required for inference
MPS / CUDA / CPU	Auto-detects Apple Silicon, NVIDIA GPU, or CPU

📂 Project Structure

DeepShield/
│
├── api/                        ← FastAPI backend
│   ├── main.py                 ← App entry point, model loaded at startup
│   ├── schemas.py              ← Pydantic response models
│   └── routes/
│       ├── predict.py          ← POST /predict/image, POST /predict/video
│       └── stream.py           ← WS /ws/webcam (real-time frame inference)
│
├── model/
│   ├── cnn_model.py            ← DeepfakeCNN (EfficientNet-B0 + optional FrequencyBranch)
│   ├── frequency_branch.py     ← FFT-based spectral feature extractor
│   └── loss.py
│
├── inference/
│   ├── predict.py              ← load_model, predict, predict_image, predict_video, predict_with_gradcam
│   └── realtime_inference.py   ← CLI webcam / video loop with frame skipping
│
├── training/
│   ├── train.py                ← Two-phase EfficientNet fine-tuning
│   ├── evaluate.py             ← Test-set evaluation with tqdm progress
│   ├── dataset.py              ← DataLoader, balanced subset sampling
│   ├── metrics.py              ← Accuracy, precision, recall, F1, confusion matrix
│   └── early_stopping.py
│
├── preprocessing/
│   ├── face_detector.py        ← detect_and_crop_face() using OpenCV Haar cascade
│   ├── frame_extractor.py      ← Extract 30 frames/video with multiprocessing
│   ├── dataset_split.py        ← Sort raw videos → real/ fake/ using metadata.json
│   ├── split_train_val_test.py ← 70/15/15 split grouped by video ID
│   └── augmentations.py
│
├── explainability/
│   ├── gradcam.py              ← Grad-CAM with forward + backward hooks
│   └── heatmap_utils.py        ← Heatmap colormap overlay
│
├── saved_models/
│   └── best_model.pth          ← Best checkpoint saved during training
│
├── app.py                      ← Streamlit UI (Image / Video / Webcam tabs)
├── requirements.txt
└── README.md

🛠️ Tech Stack

Category	Tools
Deep Learning	PyTorch, TorchVision
Model	EfficientNet-B0 (ImageNet pretrained)
Computer Vision	OpenCV
Frequency Analysis	PyTorch FFT (`torch.fft.fft2`, fftshift)
Explainability	Grad-CAM (backward hooks)
Frontend	Streamlit, streamlit-webrtc, Plotly
Backend API	FastAPI, Uvicorn, WebSockets
Data / Metrics	NumPy, Pandas, Scikit-learn
Training utilities	tqdm, early stopping

📊 Model Performance

Trained on the 140k Real vs Fake Faces dataset (Kaggle):

Metric	Score
Accuracy	~91–92%
Precision	—
Recall	—
F1 Score	—

Run python -m training.evaluate after training to get exact numbers on your test split.

📥 Dataset Setup

This project uses the 140k Real vs Fake Faces dataset from Kaggle.

Download link: https://www.kaggle.com/datasets/xhlulu/140k-real-and-fake-faces

After downloading, place it at the project root:

DeepShield/
└── 140k-faces/
    └── real_vs_fake/
        └── real-vs-fake/
            ├── train/
            │   ├── real/
            │   └── fake/
            ├── valid/
            │   ├── real/
            │   └── fake/
            └── test/
                ├── real/
                └── fake/

The 140k-faces/ folder is in .gitignore and must be placed manually on each machine.

🚀 Full Setup & Workflow

Prerequisites

1. System libraries (macOS — install before creating the venv)

brew install xz cmake libomp

2. Python version (3.10+ recommended)

pyenv install 3.12.2
pyenv local 3.12.2

3. Virtual environment

python3 -m venv venv
source venv/bin/activate       # macOS / Linux
# venv\Scripts\activate        # Windows

4. Install dependencies

pip install -r requirements.txt

Step 1 — Train the Model

python -m training.train

Trains DeepfakeCNN using two-phase EfficientNet fine-tuning for up to 50 epochs. Best checkpoint is saved to saved_models/best_model.pth whenever validation accuracy improves.

Training phases:

Phase 1 (epochs 1–5): Backbone frozen, only the classifier head trains at lr=1e-4
Phase 2 (epoch 6+): Last two EfficientNet blocks unfrozen, full model trains at lr=1e-5

Sample output:

Epoch 1/50 [Ph1]  train_loss=0.512  val_loss=0.431  val_acc=0.8120
...
Epoch 20/50 [Ph2]  train_loss=0.214  val_loss=0.198  val_acc=0.9167

Step 2 — Evaluate on the Test Set

python -m training.evaluate

Loads saved_models/best_model.pth and reports Accuracy, Precision, Recall, F1, and Confusion Matrix on the held-out test set. Includes a tqdm progress bar.

Sample output:

Evaluating: 100%|████████████| 625/625 [05:23<00:00]

Test set evaluation
----------------------------------------
Accuracy:  0.9167
Precision: 0.9210
Recall:    0.9140
F1:        0.9175

Step 3 — Launch the Streamlit App

streamlit run app.py

Opens the full UI at http://localhost:8501. Three tabs:

📷 Image Tab

Upload any face image (JPG/PNG)
Shows verdict card with confidence %, P(Real), P(Fake)
Enable Grad-CAM in sidebar to see which facial regions influenced the decision
Plotly donut chart shows Real/Fake probability split

🎬 Video Tab

Upload a video (MP4/AVI/MOV)
Choose how many frames to analyze (4–32)
Summary metrics: frames analyzed, avg P(Real), real/fake frame counts
Interactive P(Real) timeline chart (per-frame line chart with 0.5 threshold)
Frame distribution histogram showing score spread
Collapsible per-frame detail table

📹 Webcam Tab

Live webcam feed via streamlit-webrtc
Inference every 3rd frame to keep stream smooth
Bottom banner shows Real/Fake label + confidence
Top bar shows P(Real) as a fill indicator
Falls back gracefully if streamlit-webrtc is not installed

Sidebar options:

Toggle Grad-CAM overlay
Score interpretation table (what P(Real) ranges mean)

Step 4 — Run the FastAPI Backend

uvicorn api.main:app --reload --host 0.0.0.0 --port 8000

Model is loaded once at startup and reused for all requests.

Interactive API docs: http://localhost:8000/docs

Endpoint	Method	Description
`/health`	GET	Check if model is loaded and which device is in use
`/predict/image`	POST	Upload image → `{label, confidence, prob_real}`
`/predict/video`	POST	Upload video → aggregated + per-frame results
`/ws/webcam`	WebSocket	Send JPEG bytes → receive JSON predictions in real-time

Example request (image):

curl -X POST http://localhost:8000/predict/image \
  -F "file=@face.jpg"

Example response:

{
  "label": "Fake",
  "confidence": 0.9312,
  "prob_real": 0.0688
}

Step 5 — CLI Real-Time Inference (Webcam or Video)

# Webcam
python -m inference.realtime_inference

# Video file
python -m inference.realtime_inference --video path/to/video.mp4

# With Grad-CAM overlay
python -m inference.realtime_inference --video path/to/video.mp4 --gradcam

Press Q to quit. Inference runs every 3rd frame for smooth display.

Quick Reference

# ── Environment ─────────────────────────────────────────
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt

# ── Train ───────────────────────────────────────────────
python -m training.train

# ── Evaluate ────────────────────────────────────────────
python -m training.evaluate

# ── Streamlit UI ────────────────────────────────────────
streamlit run app.py

# ── FastAPI backend ─────────────────────────────────────
uvicorn api.main:app --reload --port 8000

# ── CLI webcam / video ───────────────────────────────────
python -m inference.realtime_inference [--video <path>] [--gradcam]

🔬 Model Architecture Details

DeepfakeCNN

EfficientNet-B0 (pretrained on ImageNet)
  └── features[0..8]  (MBConv blocks)
  └── classifier
        ├── Dropout(0.4)
        └── Linear(1280 → 1)          # Default mode

Optional: use_frequency=True
  EfficientNet features (1280-dim)
  + FrequencyBranch (128-dim)
  → Linear(1408 → 256) → ReLU → Dropout(0.4) → Linear(256 → 1)

FrequencyBranch

Detects spectral artefacts characteristic of GAN-generated images:

torch.fft.fft2 — 2D Fast Fourier Transform
fftshift — centres low-frequency content for spatially-consistent conv filters
log1p — compresses extreme dynamic range of FFT magnitudes
Two Conv2D + BatchNorm + MaxPool blocks
Fully connected → 128-dim feature vector

Two-Phase Training

Phase	Epochs	LR	Backbone
Phase 1 (warm-up)	1–5	1e-4	Fully frozen
Phase 2 (fine-tune)	6+	1e-5	Last 2 blocks unfrozen

📊 Evaluation Metrics

Metric	Description
Accuracy	Overall correct classifications
Precision	Of predicted fakes, how many were actually fake
Recall	Of actual fakes, how many were caught
F1 Score	Harmonic mean of precision and recall
Confusion Matrix	True/False Positive/Negative breakdown

🌐 Applications

Social media content verification
News authenticity validation
Digital identity protection
Cybercrime and fraud detection
Media forensics and journalism

🔮 Future Enhancements

Temporal modeling with 3D CNN or Vision Transformer across video frames
Audio-visual consistency check (voice + face sync)
Browser extension for in-page detection
Mobile deployment (CoreML / TFLite)
Confidence calibration and uncertainty estimation

📜 License

This project is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.devcontainer		.devcontainer
api		api
assets		assets
deepfake-detection-system/preprocessing		deepfake-detection-system/preprocessing
explainability		explainability
inference		inference
model		model
notebooks		notebooks
preprocessing		preprocessing
training		training
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🛡️ DeepShield — Real-Time Deepfake Detection System

📌 Overview

Drive link

🏗️ System Architecture

✨ Key Features

📂 Project Structure

🛠️ Tech Stack

📊 Model Performance

📥 Dataset Setup

🚀 Full Setup & Workflow

Prerequisites

1. System libraries (macOS — install before creating the venv)

2. Python version (3.10+ recommended)

3. Virtual environment

4. Install dependencies

Step 1 — Train the Model

Step 2 — Evaluate on the Test Set

Step 3 — Launch the Streamlit App

📷 Image Tab

🎬 Video Tab

📹 Webcam Tab

Step 4 — Run the FastAPI Backend

Step 5 — CLI Real-Time Inference (Webcam or Video)

Quick Reference

🔬 Model Architecture Details

DeepfakeCNN

FrequencyBranch

Two-Phase Training

📊 Evaluation Metrics

🌐 Applications

🔮 Future Enhancements

📜 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages