Skip to content

AI-powered video editor that removes redundant speech segments using VAD and LLM

License

Notifications You must be signed in to change notification settings

min-hsao/AIvideoeditor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Video Editor

An AI-powered video processing pipeline that automatically removes redundant speech segments (like repeated takes) from videos using voice activity detection and LLM filtering.

Features

  • 🎤 Voice Activity Detection (WebRTC VAD)
  • 🤖 AI-powered redundancy filtering (OpenAI/DeepSeek)
  • 🎙️ Whisper transcription
  • ✂️ Automatic video editing
  • 📁 Batch processing support
  • ⚙️ Configurable via JSON

Requirements

  • Python 3.8+
  • FFmpeg
  • API keys for OpenAI or DeepSeek

Installation

pip install -r requirements.txt

# Install FFmpeg (macOS)
brew install ffmpeg

Setup

Create an api_keys.txt file:

OPENAI_API_KEY=your_openai_api_key
DEEPSEEK_API_KEY=your_deepseek_api_key

Usage

# Basic usage (processes all videos in 'raw' folder)
python VAI-editor.py

# Specify provider
python VAI-editor.py --provider deepseek

# Custom input/output directories
python VAI-editor.py -i my_videos -o processed

# Dry run (preview without processing)
python VAI-editor.py --dry-run

# Verbose output
python VAI-editor.py --verbose

# Use config file
python VAI-editor.py -c config.json

Arguments

Argument Description
--provider AI provider: openai or deepseek
-i, --input-dir Input video directory
-o, --output-dir Output directory
--dry-run Preview without processing
--verbose Detailed output
-c, --config JSON config file
--keep-temp Keep temporary audio files

How It Works

  1. Audio Extraction: Extract audio from video
  2. VAD Processing: Detect speech segments
  3. Transcription: Transcribe segments with Whisper
  4. LLM Filtering: AI identifies redundant takes
  5. Video Rendering: Concatenate kept segments

Configuration

Create a config.json:

{
  "api": {
    "provider": "openai",
    "whisper_model": "whisper-1"
  },
  "io": {
    "input_dir": "raw",
    "output_dir": "edited"
  },
  "vad": {
    "aggressiveness": 3
  },
  "llm": {
    "model_name": "gpt-4o",
    "temperature": 0.0
  }
}

License

MIT License - see LICENSE for details.

About

AI-powered video editor that removes redundant speech segments using VAD and LLM

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages