Webinar Video Cleaner

This project provides an automated workflow to clean and enhance webinar recordings. It uses OpenAI's Whisper for transcription and Google's Gemini AI to intelligently identify and remove irrelevant segments (such as silence, filler words, or off-topic setup), correct transcription errors, and generate content chapters.

Features

Automated Transcription: Converts MP4 video audio to SRT subtitles using the Whisper model.
Smart Content Analysis: Uses Google Gemini AI to analyze the transcript and identify "useless" ranges to delete.
Video Cutting: Automatically removes the identified ranges from the original MP4 file.
Subtitle Correction: Uses AI to fix transcription errors and improve subtitle quality.
Subtitle Synchronization: Adjusts SRT timestamps to perfectly match the edited (cut) video.
Chapter Generation: Generates structured chapters with titles and summaries for the final video.
Cost & Time Tracking: Monitors script execution time and estimates Gemini API costs.

Prerequisites

Python 3.8+
FFmpeg: Required for video and audio processing. Any standard installation should work.
Google Gemini API Key: Required for AI analysis and text processing.

Installation

Clone the repository:

git clone <repository-url>
cd <repository-directory>

Install dependencies: Ensure you have pip installed.

pip install -e .
# Or if requirements.txt is available
# pip install -r requirements.txt

Environment Setup: Create a .env file in the root directory and add your Google Gemini API key:
```
GOOGLE_API_KEY=your_actual_api_key_here
```

Usage

The main entry point for the application is main_video_editor.py.

Run the script:
```
python3 main_video_editor.py
```
Follow the interactive prompts:
- Enter MP4 Path: Paste the full path to your webinar video file.
- Select Mode:
  - 1: Full Video Cleaner (Transcribe + Correct + Analyze + Cut + Chapters).
  - 2: Transcription & Chapters Only (Skips the video cutting step).
- Webinar Topic: (Optional) Provide a topic to help the AI understand context better during correction.
Review Outputs: All generated files are saved in the same directory as the original video. The script provides a summary at the end containing paths to:
- Original SRT
- Corrected SRT
- Cleaned (Cut) Video
- Chapters File
- Usage Stats

Workflow Details

The system operates through a cascade of specialized scripts orchestrated by main_video_editor.py:

Transcription (transcribe_to_srt.py): Extracts audio and generates an initial SRT transcript.
Correction (correct_srt_errors.py): Sends the transcript to Gemini to fix spelling, grammar, and recognition errors.
Analysis (audio_cleaner.py): Full Mode Only. Analyzes the text to find start and end timestamps of segments that should be removed.
Cutting (cut_mp4.py): Full Mode Only. Uses FFmpeg to physically remove the identified segments from the video file.
Re-synchronization (apply_cuts_to_srt.py): Full Mode Only. Adjusts the timestamps in the subtitle file so they align with the new, shorter video.
Chapters (generate_chapters.py): Analyzes the final content to generate a list of chapters with timestamps.

Project Structure

main_video_editor.py: Orchestrator script that manages the entire pipeline.
transcribe_to_srt.py: Handles Whisper transcription.
audio_cleaner.py: Interface for Gemini AI to identify cuts.
cut_mp4.py: Handles video processing and cutting logic.
correct_srt_errors.py: Logic for AI-based subtitle correction.
apply_cuts_to_srt.py: Utilities for SRT timestamp manipulation.
generate_chapters.py: Generates video chapters.
common_utils.py: Shared utilities for path handling, logging, and cost calculation.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
apply_cuts_to_srt.py		apply_cuts_to_srt.py
audio_cleaner.py		audio_cleaner.py
check_srt_alignment.py		check_srt_alignment.py
codegen_instructions.md		codegen_instructions.md
common_utils.py		common_utils.py
correct_srt_errors.py		correct_srt_errors.py
cut_mp4.py		cut_mp4.py
generate_chapters.py		generate_chapters.py
main_video_editor.py		main_video_editor.py
pyproject.toml		pyproject.toml
transcribe_to_srt.py		transcribe_to_srt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Webinar Video Cleaner

Features

Prerequisites

Installation

Usage

Workflow Details

Project Structure

About

Uh oh!

Releases

Packages

Languages

artemu78/WebinarVideoCleaner

Folders and files

Latest commit

History

Repository files navigation

Webinar Video Cleaner

Features

Prerequisites

Installation

Usage

Workflow Details

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages