VidCensor is a full-stack, automated video sanitization platform designed to automatically detect and censor (mute or beep) profanity in the audio track of video files without re-encoding the video stream. This preserves video quality and significantly speeds up the process compared to full video re-rendering.
- Full-Stack Web App: Drag-and-drop UI for uploading videos and tracking status.
- GPU Acceleration: Uses NVIDIA CUDA to transcribe audio at lightning speeds.
- High-Concurrency Backend: FastAPI Async architecture handles multiple concurrent uploads without blocking.
- No Re-Encoding: Preserves original video quality by only manipulating the audio stream.
- Background Processing: Asynchronous Task Queue (Celery) handles heavy ML workloads.
- Cloud-Native Storage: Built-in MinIO (S3 compatible) for file management, ready for AWS deployment.
- Auto-Cleanup: Automated retention policies delete old files to save storage space.
- Security Guardrails:
- Frontend (UX): "Fail-Fast" validation instantly rejects files > 300MB or > 5 minutes long.
- Backend (Security): Strict content-length checks and FFmpeg probing enforce hard limits.
- Full Observability Stack:
- Application: Distributed Tracing (Jaeger) and Request Metrics (Prometheus).
- Infrastructure: cAdvisor integration monitors raw Container CPU/RAM usage to detect bottlenecks.
vidcensor_demo.mp4
The application runs as a set of Docker containers orchestrated by docker-compose:
| Service | Technology | Description |
|---|---|---|
| Frontend | Vue.js + Vite | Modern web UI for uploads and monitoring. |
| Backend | FastAPI | REST API managing tasks, uploads, and downloads. |
| Worker | Celery + PyTorch | GPU-enabled worker running AI tasks & FFmpeg. |
| Beat | Celery Beat | Scheduler for periodic tasks (e.g., file cleanup). |
| Broker | Redis | Message broker for task queues. |
| Database | PostgreSQL | Persists task metadata and status. |
| Storage | MinIO | S3-compatible object storage for video files. |
| Telemetry | OTel, Jaeger, Prometheus, Grafana | Comprehensive monitoring: Logs, Traces, and Metrics. |
| Infra Monitor | cAdvisor | Real-time container resource usage (CPU/RAM/Net). |
To run the GPU-accelerated stack, your host machine needs the following:
- NVIDIA GPU: Compute Capability 3.5+ (Recommended: 6GB+ VRAM).
- RAM: 16GB+ recommended.
- Docker Engine & Docker Compose (v2)
- NVIDIA Drivers: Install the latest drivers for your GPU.
- NVIDIA Container Toolkit: This is required for Docker to access your GPU.
VidCensor/
├── backend/ # FastAPI & Celery Python Code
│ ├── app/
│ │ ├── api/ # API Route endpoints
│ │ ├── worker/ # Celery tasks (ML processing logic)
│ │ ├── core/ # Configuration & Config.py
│ │ └── s3.py # S3 Client Logic
│ ├── vidcensor/ # Core censoring library (FFmpeg/Pydub logic)
│ └── tests/ # Comprehensive Test Suite (96% Coverage)
│ ├── integration/ # Integration Tests (API/DB/Worker with Testcontainers)
│ └── vidcensor/ # Unit Tests (Pure Logic)
│ ├── pyproject.toml # Poetry Dependency Management
│ └── poetry.lock # Locked dependencies
├── frontend/ # Vue.js Application (Source code)
├── telemetry/ # Infrastructure as Code (Observability)
│ ├── dashboards/ # JSON definitions for Grafana
│ ├── dashboards.yaml # Dashboard provisioning
│ ├── datasources.yaml # Prometheus connection config
│ ├── otel-collector-config.yaml # OpenTelemetry Pipeline config
│ └── prometheus.yaml # Metrics scraping config
├── docker-compose.yml # Orchestration for all services
├── Dockerfile.backend # Lean API image
├── Dockerfile.worker # Heavy GPU image
├── requirements-backend.txt
├── requirements-worker.txt
└── minio-init.sh # Script to provision local S3 buckets
The user interface is a responsive Single Page Application (SPA) built to interact seamlessly with the VidCensor API. It allows users to upload videos, track the GPU-accelerated processing pipeline in real-time, and preview/download the sanitized results.
- Framework: Vue.js 3 (Composition API)
- Build Tool: Vite (Fast hot-module replacement)
- Styling: Tailwind CSS v4.0 (Utility-first CSS)
- HTTP Client: Axios (Centralized API service)
- Deployment: Nginx (Alpine Linux)
- Drag-and-Drop Upload: Intuitive file selection zone supporting MP4 files.
- Real-Time Polling: The UI polls the backend every 2 seconds to fetch granular task updates (e.g., Extracting Audio → AI Transcribing → Remuxing).
- Granular Status Mapping: Automatically maps backend Enum statuses (e.g.,
EXTRACTING_AUDIO) to user-friendly messages using Vue computed properties. - In-Browser Preview: Integrated HTML5 video player allows users to verify the censorship (beeps/silence) before downloading the final file.
- Error Handling: Robust error management for file size limits, connection timeouts, and backend failures.
- Synchronous Size Check: Rejects files exceeding the configured limit (Default: 300MB) instantly upon selection.
- Asynchronous Duration Check: Uses a hidden HTML5 video element to probe metadata and reject videos longer than the limit (Default: 5 mins) without uploading or playing the file.
- Cancellable Uploads: Utilizes
AbortControllersignals to allow users to terminate large uploads mid-stream via a prominent "Cancel" button.
The frontend is refactored into a modular architecture to separate logic (App.vue), presentation (components/), and data fetching (services/).
frontend/src/
├── components/
│ ├── UploadZone.vue # Handles file drag-and-drop & validation
│ ├── ProcessingStatus.vue # Visualizes the pipeline steps & progress bar
│ └── DownloadReady.vue # Success state with Video Player & Download button
├── services/
│ └── api.js # Centralized Axios client for API communication
└── App.vue # Main Controller (Manages State: Idle -> Processing -> Completed)
The backend is a high-performance, asynchronous microservice designed to handle heavy ML workloads without blocking the user interface. It decouples the API layer (FastAPI) from the heavy lifting (Celery/GPU), ensuring the server remains responsive even while transcoding large video files.
- API Framework: FastAPI (Fully Async implementation with
aiofiles&asyncpg) - Task Queue: Celery (Distributed task execution)
- Message Broker: Redis (In-memory data structure store)
- Database: PostgreSQL (Persistent task metadata storage)
- Object Storage: MinIO (S3-compatible local cloud storage simulation)
- Asynchronous Processing: Long-running tasks (transcription, remuxing) are offloaded to background workers, preventing request timeouts.
- GPU Acceleration: Dedicated workers utilize NVIDIA CUDA cores for lightning-fast processing.
- Cloud-Native Storage: "Dual-Host" S3 architecture allows the backend to communicate internally (Docker-to-Docker) while serving Presigned URLs externally (Browser-to-Docker).
- Audio Fidelity Protection: Uses an intermediate lossless WAV format to prevent generational loss and dynamically matches the source audio bitrate during final export.
- Auto-Maintenance: Integrated Celery Beat scheduler automatically enforces retention policies, cleaning up stale raw/processed files to optimize storage costs.
- Structured Logging: All services output machine-readable JSON logs via
structlog, correlated with Trace IDs. - Distributed Tracing: A single Trace ID follows a request from the API (
POST /upload) -> Redis Queue -> GPU Worker (process_video_task), visible as a unified waterfall in Jaeger. - Health Monitoring: A dedicated
/healthendpoint runs background checks on DB, Redis, and S3 connectivity, feeding real-time status to the dashboard. - Infrastructure Monitoring: cAdvisor runs as a sidecar to scrape raw Docker statistics, allowing Grafana to correlate Application Load (Requests/sec) with Infrastructure Stress (Container CPU spikes).
- Infrastructure as Code: Grafana Dashboards and Data Sources are auto-provisioned via YAML.
The backend follows a 12-Factor App design, separating configuration, API routes, and worker logic into distinct modules.
backend
├── alembic/ # Database migration scripts (schema versioning).
├── app/
│ ├── api/
│ │ ├── health/ # System health checks (DB, Redis, S3) & background polling.
│ │ ├── v1/
│ │ │ ├── tasks/ # Task status polling & download link generation.
│ │ │ └── upload/ # Video upload handling & validation.
│ │ └── deps.py # Async database dependency injection.
│ ├── core/
│ │ ├── config.py # Pydantic settings (Env vars: DB URL, S3 credentials).
│ │ ├── logging.py # Structlog configuration for JSON logging.
│ │ └── telemetry.py # OpenTelemetry setup (Tracing & Metrics).
│ ├── db/
│ │ ├── models.py # SQLAlchemy ORM models (Task table definition).
│ │ └── session.py # Dual-engine setup (Async for API, Sync for Worker).
│ ├── schemas/ # Pydantic models for request/response validation.
│ ├── worker/
│ │ ├── celery_app.py # Celery app & Beat scheduler configuration.
│ │ ├── tasks.py # Main GPU processing pipeline.
│ │ └── cleanup.py # Periodic maintenance tasks (S3 retention).
│ ├── crud.py # Synchronous DB operations (for Worker).
│ ├── crud_async.py # Asynchronous DB operations (for API).
│ ├── main.py # FastAPI entrypoint & middleware configuration.
│ └── s3.py # MinIO/S3 client wrapper (Dual-host support).
├── tests/ # Hybrid test suite (Integration + Unit).
│ ├── integration/ # Testcontainers-based tests (Real DB/MinIO).
│ └── vidcensor/ # Unit tests for core business logic.
│ └── unit/ # Unit tests for other API logic.
├── vidcensor/ # Core logic library.
├── poetry.lock # Exact dependency versions (Reproducibility).
├── pyproject.toml # Project metadata & dependency definitions.
└── profanity_list.txt # Censorship dictionary.
-
Clone the repository:
git clone https://github.com/rdrishabh38/[REDACTED].git cd REDACTED -
Setup Environment Variables: modify the properties from
backend/app/core/config.pyas required or leave them be. -
Launch the Stack:
docker compose up -d --build
-
Access the Application:
- Frontend UI: http://localhost:5173
- API Documentation: http://localhost:8000/docs
- Grafana Dashboards: http://localhost:3001 (User/Pass:
admin/admin) - Jaeger Traces: http://localhost:16686
- MinIO Console: http://localhost:9001 (User/Pass:
minioadmin)
The project includes two Docker configurations for the frontend:
- Port:
3000(Mapped to internal 5173) - Description: Runs Vite Dev Server. Changes to
.vuefiles are reflected instantly. - Command: Standard
docker compose upuses this mode by default.
- Port:
8080 - Description: Uses a Multi-Stage Docker build. Node.js compiles the assets, and a lightweight Nginx container serves them with Gzip compression and caching.
- To Run:
docker build -t vidcensor-frontend-prod -f frontend/Dockerfile.prod ./frontend docker run -p 8080:80 vidcensor-frontend-prod
VidCensor follows the 12-Factor App methodology. Configuration is managed via environment variables in the config.py file.
| Variable | Default | Description |
|---|---|---|
CENSOR_MODE |
silence_and_beep |
How to censor: beep, silence, or silence_and_beep. |
RETENTION_MINUTES |
60 |
Time before raw/processed videos are auto-deleted. |
S3_ENDPOINT_URL |
http://minio:9000 |
Leave set for local dev, unset for AWS S3. |
FFMPEG_PATH |
ffmpeg |
Usually just "ffmpeg", but provide the full path if it's not in your system PATH. |
- Adjust other settings like padding, bitrate, etc., as needed.
Edit profanity_list.txt:
Add the words you want to censor (one word per line, case-insensitive).
more configurations are present in backend/app/core/config.py, please refer it for more parameter tweaking.
Create a .env file in the frontend/ directory to control UI-level constraints. These are "baked in" at build time.
| Variable | Default | Description |
|---|---|---|
VITE_MAX_FILE_SIZE_MB |
300 |
Maximum file size in Megabytes allowed for upload. |
VITE_MAX_DURATION_SEC |
300 |
Maximum video duration in Seconds allowed. |
-
View Logs:
docker compose logs -f backend worker cadvisor
-
Check System Health:
curl http://localhost:8000/health # Returns: {"status": "healthy", "components": {"postgres": 1, "redis": 1, "minio": 1}} -
Rebuild specific service (e.g., after changing requirements):
docker compose up -d --build backend
-
Clean Reset (Ephemeral): The DB and Storage are configured to be ephemeral by default. To wipe all data and start fresh:
docker compose down docker compose up -d
This project maintains a high standard of quality with a 95% Code Coverage enforcement policy.
We use a hybrid testing strategy:
- Unit Tests (
tests/vidcensor/): Verify pure business logic (FFmpeg commands, censoring math) in isolation. - Integration Tests (
tests/integration/): Use Testcontainers to spin up real, ephemeral Docker containers for Postgres and MinIO. This ensures tests run against actual infrastructure logic, not just mocks.
-
Ensure Docker is running (Required for Testcontainers).
-
Run the full suite:
cd backend poetry run pytest tests/This command will auto-provision the necessary DB/S3 containers, run all tests, and generate a coverage report. The command will fail if coverage drops below 95%.
Every Pull Request is automatically vetted by GitHub Actions:
- System Dependencies: Installs
ffmpegandlibmagicon the runner. - Quality Gates: Fails the pipeline if tests fail or if Code Coverage drops below 95%.
This project adheres to strict coding standards (Strict Typing, JSON Logging, Security Checks) to ensure robustness.
We use Poetry for dependency management.
- Install Poetry:
curl -sSL [https://install.python-poetry.org](https://install.python-poetry.org) | python3 - - Install Dependencies:
Navigate to the backend directory and install the environment.
cd backend poetry install --with dev,worker - Activate Shell:
poetry shell
This project uses pre-commit to enforce standards before code is committed. This pipeline includes:
- Mypy: Strict static type checking.
- Ruff: Fast linting.
- Black/Isort: Code formatting.
- Bandit: Security vulnerability scanning.
Setup: Run this once after cloning:
pre-commit installManual Run: To check all files without committing:
pre-commit run --all-files
We maintain a strict separation between Development (Poetry) and Production (Docker) dependencies to keep images optimized.
pyproject.toml: The Single Source of Truth.requirements-backend.txt: Auto-generated lean dependencies for the API.requirements-worker.txt: Auto-generated heavy AI dependencies for the GPU Worker.
Workflow to add a new library:
- Add package via Poetry:
cd backend && poetry add <package_name> - Regenerate the Docker requirements files:
This runs
# Run from root directory make requirementspoetry exportto strictly pin versions and sync all files.
We use Semantic Versioning and automate our CHANGELOG.md using Git-Cliff.
To cut a new release:
- Commit all changes.
- Push a new tag:
git tag v2.1.0 git push origin v2.1.0
- Automation: A GitHub Action will trigger, generating the changelog for the new commits, prepending it to
CHANGELOG.md, and pushing the update back to themainbranch.
Copyright (c) 2025 Rishabh Dixit. All Rights Reserved.
VidCensor is proprietary software.
The source code for this project is hosted in a private repository to protect intellectual property.
-
Recruiters & Hiring Managers: If you are reviewing my application and wish to examine the source code, please contact me directly. I can provide temporary read-access to the private repository.
-
Licensing: The software is not currently available for public use or distribution.
Please contact: [rdrishabh38@gmail.com] if you would like to connect for further discussion about your use case.