An AI-powered video processing pipeline that automatically removes redundant speech segments (like repeated takes) from videos using voice activity detection and LLM filtering.
- 🎤 Voice Activity Detection (WebRTC VAD)
- 🤖 AI-powered redundancy filtering (OpenAI/DeepSeek)
- 🎙️ Whisper transcription
- ✂️ Automatic video editing
- 📁 Batch processing support
- ⚙️ Configurable via JSON
- Python 3.8+
- FFmpeg
- API keys for OpenAI or DeepSeek
pip install -r requirements.txt
# Install FFmpeg (macOS)
brew install ffmpegCreate an api_keys.txt file:
OPENAI_API_KEY=your_openai_api_key
DEEPSEEK_API_KEY=your_deepseek_api_key
# Basic usage (processes all videos in 'raw' folder)
python VAI-editor.py
# Specify provider
python VAI-editor.py --provider deepseek
# Custom input/output directories
python VAI-editor.py -i my_videos -o processed
# Dry run (preview without processing)
python VAI-editor.py --dry-run
# Verbose output
python VAI-editor.py --verbose
# Use config file
python VAI-editor.py -c config.json| Argument | Description |
|---|---|
--provider |
AI provider: openai or deepseek |
-i, --input-dir |
Input video directory |
-o, --output-dir |
Output directory |
--dry-run |
Preview without processing |
--verbose |
Detailed output |
-c, --config |
JSON config file |
--keep-temp |
Keep temporary audio files |
- Audio Extraction: Extract audio from video
- VAD Processing: Detect speech segments
- Transcription: Transcribe segments with Whisper
- LLM Filtering: AI identifies redundant takes
- Video Rendering: Concatenate kept segments
Create a config.json:
{
"api": {
"provider": "openai",
"whisper_model": "whisper-1"
},
"io": {
"input_dir": "raw",
"output_dir": "edited"
},
"vad": {
"aggressiveness": 3
},
"llm": {
"model_name": "gpt-4o",
"temperature": 0.0
}
}MIT License - see LICENSE for details.