Video Caption Suite

Batch video captioning using Qwen3-VL-8B vision-language model. Select a directory, process videos, get captions saved alongside them.

Requirements

Python 3.10+
CUDA-capable GPU (8GB+ VRAM recommended)
Node.js 18+ (for frontend build)

Installation

Windows:

install.bat

Linux/Mac:

chmod +x install.sh
./install.sh

This creates a virtual environment and installs all dependencies.

Usage

Windows:

start.bat

Linux/Mac:

./start.sh

Open http://localhost:8000 in your browser.

Workflow

Click Settings and select your working directory
Videos from that directory appear in the grid
Select videos and click Process
Captions are saved as .txt files alongside the videos

Configuration

Edit config.py to adjust:

Setting	Default	Description
`MODEL_ID`	Qwen/Qwen3-VL-8B-Instruct	HuggingFace model
`MAX_FRAMES_PER_VIDEO`	128	Frames extracted per video
`FRAME_SIZE`	336	Frame dimension in pixels
`MAX_TOKENS`	512	Max caption length
`TEMPERATURE`	0.3	Generation creativity

Multi-GPU Processing

On systems with multiple CUDA GPUs, the suite automatically detects available devices and enables parallel processing:

Auto-detection: GPUs are detected on startup via /api/system/gpu
Batch size: Set how many videos to process simultaneously (1 per GPU, max 8)
Parallel workers: Each GPU loads its own model copy and processes videos independently

The batch size slider appears in Settings → Optimization only when multiple GPUs are detected. Each GPU requires ~16GB VRAM to hold the Qwen3-VL-8B model.

Project Structure

Video Caption Suite/
├── backend/          # FastAPI server
├── frontend/         # Vue 3 UI
├── models/           # Downloaded model cache
├── config.py         # Settings
├── install.bat/sh    # Installation
└── start.bat/sh      # Launch server

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.claude		.claude
assets		assets
backend		backend
docs		docs
documentation		documentation
frontend		frontend
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
config.py		config.py
install.bat		install.bat
install.sh		install.sh
model_loader.py		model_loader.py
requirements.txt		requirements.txt
start.bat		start.bat
start.sh		start.sh
user_config.json		user_config.json
video_processor.py		video_processor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video Caption Suite

Requirements

Installation

Usage

Workflow

Configuration

Multi-GPU Processing

Project Structure

License

About

Uh oh!

Releases

Packages

Contributors 2

Languages

filliptm/Video-Caption-Suite

Folders and files

Latest commit

History

Repository files navigation

Video Caption Suite

Requirements

Installation

Usage

Workflow

Configuration

Multi-GPU Processing

Project Structure

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages