A fully offline, explainable deepfake detection system built on EfficientNet-B0 with Grad-CAM visual explanations, a polished Streamlit UI, and a FastAPI backend for real-time inference.
Deepfake technology uses generative AI to create highly realistic synthetic faces in images and videos. While powerful, it poses serious risks — misinformation, identity fraud, impersonation, and reputational harm.
DeepShield addresses this with a privacy-first, fully offline detection pipeline that:
- Classifies images and videos as ✅ Real or 🚨 Fake
- Provides confidence scores and P(Real) / P(Fake) probabilities
- Explains decisions visually using Grad-CAM heatmaps
- Runs entirely on your machine — no cloud calls, no data leaves your device
https://drive.google.com/drive/folders/1kyWpFCtVWmF7qZGzEp4oOrDCnI8K6OAN?role=writer
┌─────────────────────┐
│ Input (Image / │
│ Video / Webcam) │
└──────────┬──────────┘
↓
┌─────────────────────┐
│ Face Detection │ ← OpenCV Haar Cascade (face_detector.py)
│ & Frame Sampling │ ← Frame extractor (frame_extractor.py)
└──────────┬──────────┘
↓
┌─────────────────────┐
│ Image Preprocessing│ ← Resize 224×224, ImageNet normalize
└──────────┬──────────┘
↓
┌──────────────────────────────────────┐
│ DeepfakeCNN Model │
│ │
│ EfficientNet-B0 (Spatial Branch) │ → 1280-dim features
│ + │
│ FrequencyBranch (FFT Spectrum) │ → 128-dim features [opt-in]
│ │
│ Fused → Linear head → Binary logit │
└──────────┬───────────────────────────┘
↓
┌─────────────────────┐
│ Classification │ Real / Fake + confidence score
└──────────┬──────────┘
↓
┌─────────────────────┐
│ Grad-CAM Module │ Visual heatmap over suspicious regions
└─────────────────────┘
| Feature | Details |
|---|---|
| EfficientNet-B0 backbone | ImageNet-pretrained, two-phase fine-tuning |
| Frequency-domain analysis | Optional FFT branch detects GAN grid artefacts |
| Face detection | OpenCV Haar cascade — crops to face before inference |
| Grad-CAM explanations | Heatmap overlay showing which regions drove the decision |
| Full video analysis | Samples N frames evenly, aggregates with majority vote + timeline chart |
| Live webcam | streamlit-webrtc in the UI + CLI realtime script |
| FastAPI backend | REST + WebSocket endpoints for image, video, and frame streaming |
| Fully offline | No internet connection required for inference |
| MPS / CUDA / CPU | Auto-detects Apple Silicon, NVIDIA GPU, or CPU |
DeepShield/
│
├── api/ ← FastAPI backend
│ ├── main.py ← App entry point, model loaded at startup
│ ├── schemas.py ← Pydantic response models
│ └── routes/
│ ├── predict.py ← POST /predict/image, POST /predict/video
│ └── stream.py ← WS /ws/webcam (real-time frame inference)
│
├── model/
│ ├── cnn_model.py ← DeepfakeCNN (EfficientNet-B0 + optional FrequencyBranch)
│ ├── frequency_branch.py ← FFT-based spectral feature extractor
│ └── loss.py
│
├── inference/
│ ├── predict.py ← load_model, predict, predict_image, predict_video, predict_with_gradcam
│ └── realtime_inference.py ← CLI webcam / video loop with frame skipping
│
├── training/
│ ├── train.py ← Two-phase EfficientNet fine-tuning
│ ├── evaluate.py ← Test-set evaluation with tqdm progress
│ ├── dataset.py ← DataLoader, balanced subset sampling
│ ├── metrics.py ← Accuracy, precision, recall, F1, confusion matrix
│ └── early_stopping.py
│
├── preprocessing/
│ ├── face_detector.py ← detect_and_crop_face() using OpenCV Haar cascade
│ ├── frame_extractor.py ← Extract 30 frames/video with multiprocessing
│ ├── dataset_split.py ← Sort raw videos → real/ fake/ using metadata.json
│ ├── split_train_val_test.py ← 70/15/15 split grouped by video ID
│ └── augmentations.py
│
├── explainability/
│ ├── gradcam.py ← Grad-CAM with forward + backward hooks
│ └── heatmap_utils.py ← Heatmap colormap overlay
│
├── saved_models/
│ └── best_model.pth ← Best checkpoint saved during training
│
├── app.py ← Streamlit UI (Image / Video / Webcam tabs)
├── requirements.txt
└── README.md
| Category | Tools |
|---|---|
| Deep Learning | PyTorch, TorchVision |
| Model | EfficientNet-B0 (ImageNet pretrained) |
| Computer Vision | OpenCV |
| Frequency Analysis | PyTorch FFT (torch.fft.fft2, fftshift) |
| Explainability | Grad-CAM (backward hooks) |
| Frontend | Streamlit, streamlit-webrtc, Plotly |
| Backend API | FastAPI, Uvicorn, WebSockets |
| Data / Metrics | NumPy, Pandas, Scikit-learn |
| Training utilities | tqdm, early stopping |
Trained on the 140k Real vs Fake Faces dataset (Kaggle):
| Metric | Score |
|---|---|
| Accuracy | ~91–92% |
| Precision | — |
| Recall | — |
| F1 Score | — |
Run
python -m training.evaluateafter training to get exact numbers on your test split.
This project uses the 140k Real vs Fake Faces dataset from Kaggle.
Download link: https://www.kaggle.com/datasets/xhlulu/140k-real-and-fake-faces
After downloading, place it at the project root:
DeepShield/
└── 140k-faces/
└── real_vs_fake/
└── real-vs-fake/
├── train/
│ ├── real/
│ └── fake/
├── valid/
│ ├── real/
│ └── fake/
└── test/
├── real/
└── fake/
The
140k-faces/folder is in.gitignoreand must be placed manually on each machine.
brew install xz cmake libomppyenv install 3.12.2
pyenv local 3.12.2python3 -m venv venv
source venv/bin/activate # macOS / Linux
# venv\Scripts\activate # Windowspip install -r requirements.txtpython -m training.trainTrains DeepfakeCNN using two-phase EfficientNet fine-tuning for up to 50 epochs. Best checkpoint is saved to saved_models/best_model.pth whenever validation accuracy improves.
Training phases:
- Phase 1 (epochs 1–5): Backbone frozen, only the classifier head trains at lr=1e-4
- Phase 2 (epoch 6+): Last two EfficientNet blocks unfrozen, full model trains at lr=1e-5
Sample output:
Epoch 1/50 [Ph1] train_loss=0.512 val_loss=0.431 val_acc=0.8120
...
Epoch 20/50 [Ph2] train_loss=0.214 val_loss=0.198 val_acc=0.9167
python -m training.evaluateLoads saved_models/best_model.pth and reports Accuracy, Precision, Recall, F1, and Confusion Matrix on the held-out test set. Includes a tqdm progress bar.
Sample output:
Evaluating: 100%|████████████| 625/625 [05:23<00:00]
Test set evaluation
----------------------------------------
Accuracy: 0.9167
Precision: 0.9210
Recall: 0.9140
F1: 0.9175
streamlit run app.pyOpens the full UI at http://localhost:8501. Three tabs:
- Upload any face image (JPG/PNG)
- Shows verdict card with confidence %, P(Real), P(Fake)
- Enable Grad-CAM in sidebar to see which facial regions influenced the decision
- Plotly donut chart shows Real/Fake probability split
- Upload a video (MP4/AVI/MOV)
- Choose how many frames to analyze (4–32)
- Summary metrics: frames analyzed, avg P(Real), real/fake frame counts
- Interactive P(Real) timeline chart (per-frame line chart with 0.5 threshold)
- Frame distribution histogram showing score spread
- Collapsible per-frame detail table
- Live webcam feed via
streamlit-webrtc - Inference every 3rd frame to keep stream smooth
- Bottom banner shows Real/Fake label + confidence
- Top bar shows P(Real) as a fill indicator
- Falls back gracefully if
streamlit-webrtcis not installed
Sidebar options:
- Toggle Grad-CAM overlay
- Score interpretation table (what P(Real) ranges mean)
uvicorn api.main:app --reload --host 0.0.0.0 --port 8000Model is loaded once at startup and reused for all requests.
Interactive API docs: http://localhost:8000/docs
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Check if model is loaded and which device is in use |
/predict/image |
POST | Upload image → {label, confidence, prob_real} |
/predict/video |
POST | Upload video → aggregated + per-frame results |
/ws/webcam |
WebSocket | Send JPEG bytes → receive JSON predictions in real-time |
Example request (image):
curl -X POST http://localhost:8000/predict/image \
-F "file=@face.jpg"Example response:
{
"label": "Fake",
"confidence": 0.9312,
"prob_real": 0.0688
}# Webcam
python -m inference.realtime_inference
# Video file
python -m inference.realtime_inference --video path/to/video.mp4
# With Grad-CAM overlay
python -m inference.realtime_inference --video path/to/video.mp4 --gradcamPress Q to quit. Inference runs every 3rd frame for smooth display.
# ── Environment ─────────────────────────────────────────
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
# ── Train ───────────────────────────────────────────────
python -m training.train
# ── Evaluate ────────────────────────────────────────────
python -m training.evaluate
# ── Streamlit UI ────────────────────────────────────────
streamlit run app.py
# ── FastAPI backend ─────────────────────────────────────
uvicorn api.main:app --reload --port 8000
# ── CLI webcam / video ───────────────────────────────────
python -m inference.realtime_inference [--video <path>] [--gradcam]EfficientNet-B0 (pretrained on ImageNet)
└── features[0..8] (MBConv blocks)
└── classifier
├── Dropout(0.4)
└── Linear(1280 → 1) # Default mode
Optional: use_frequency=True
EfficientNet features (1280-dim)
+ FrequencyBranch (128-dim)
→ Linear(1408 → 256) → ReLU → Dropout(0.4) → Linear(256 → 1)
Detects spectral artefacts characteristic of GAN-generated images:
torch.fft.fft2— 2D Fast Fourier Transformfftshift— centres low-frequency content for spatially-consistent conv filterslog1p— compresses extreme dynamic range of FFT magnitudes- Two Conv2D + BatchNorm + MaxPool blocks
- Fully connected → 128-dim feature vector
| Phase | Epochs | LR | Backbone |
|---|---|---|---|
| Phase 1 (warm-up) | 1–5 | 1e-4 | Fully frozen |
| Phase 2 (fine-tune) | 6+ | 1e-5 | Last 2 blocks unfrozen |
| Metric | Description |
|---|---|
| Accuracy | Overall correct classifications |
| Precision | Of predicted fakes, how many were actually fake |
| Recall | Of actual fakes, how many were caught |
| F1 Score | Harmonic mean of precision and recall |
| Confusion Matrix | True/False Positive/Negative breakdown |
- Social media content verification
- News authenticity validation
- Digital identity protection
- Cybercrime and fraud detection
- Media forensics and journalism
- Temporal modeling with 3D CNN or Vision Transformer across video frames
- Audio-visual consistency check (voice + face sync)
- Browser extension for in-page detection
- Mobile deployment (CoreML / TFLite)
- Confidence calibration and uncertainty estimation
This project is released under the MIT License.