v2v

Turn audio into videos with AI-generated visuals and word-by-word captions.

Send an audio file to a Telegram bot and get back a video with matching images and synced captions. Uses AssemblyAI for transcription, your choice of LLM (DeepSeek or Kimi) for scene descriptions, and FFmpeg for video rendering.

Built with TypeScript and Bun. Supports multiple video styles, AI image generation via Cloudflare Workers or Together AI, and optional MinIO/S3 upload.

Setup Guide • Docker Guide • License

How it works

Upload audio through Telegram (or send a URL)
AI transcribes it with word-level timestamps
AI generates image descriptions for each segment
Images are generated (AI) or searched (DuckDuckGo)
Video is rendered with word-by-word captions
You get the finished video back

Processing time: 3-7 minutes for a typical 2-minute audio.

Quick Start

Run with Docker (Recommended)

Prerequisites:

Docker and Docker Compose
Telegram bot token (get one from @BotFather)
API keys for AssemblyAI and your chosen AI provider (DeepSeek or Kimi)

Setup:

cp .env.example .env
# Edit .env and add your API keys

Run:

docker-compose up -d

See DOCKER.md for detailed Docker instructions.

Run with Bun (Local)

Prerequisites:

Bun runtime
FFmpeg for video processing
Telegram bot token and API keys

Install:

bun install
bun font/add.ts  # Install caption font

Configure:

cp .env.example .env
# Edit .env and add your API keys

Run:

bun start

See the Setup Guide for detailed instructions, optional features, and API service setup.

Video Styles

The bot supports different video styles. Add a hashtag when sending audio to pick a style:

Style	Hashtag	Look
History	`#history` (default)	Oil painting aesthetic, karaoke captions, pan effect
WW2	`#ww2`	Black-and-white archival photos, simple white captions

You can also override specific settings with options:

#history --pan              # Enable pan effect
#ww2 --karaoke              # Enable karaoke highlighting
#history --highlight=yellow # Change highlight color
#ww2 --no-pan               # Disable pan effect

Send /styles in Telegram to see all available styles and options.

Commands

Command	What it does
`/start`	Get started
`/upload`	Upload an audio file
`/url`	Process audio from a URL
`/queue`	Check pending jobs
`/styles`	List available styles
`/help`	Show usage instructions
`/cleanup`	Clear temp files

Large Files

Telegram limits downloads to 20MB. For bigger files:

Compress your audio first, or
Upload to a file host and use /url <link>

/url https://example.com/large-audio.mp3 #history

Services

AssemblyAI - Transcription with word-level timing
DeepSeek / Kimi - LLM for scene descriptions
Cloudflare Workers - AI image generation (SDXL 1.0)
Together AI - AI image generation (FLUX.1-schnell)
MinIO / AWS S3 - Optional video storage

License

OCL v1.0. Free for personal use. Commercial use requires contributing back. See LICENSE.md.

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
font		font
src		src
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
DOCKER.md		DOCKER.md
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
SETUP.md		SETUP.md
bun.lock		bun.lock
cloudflarework.js		cloudflarework.js
docker-compose.yml		docker-compose.yml
gemini-for-google-workspace-prompting-guide-101.pdf		gemini-for-google-workspace-prompting-guide-101.pdf
index.ts		index.ts
package.json		package.json
te.ts		te.ts
test-cap.ts		test-cap.ts
test-workflow.ts		test-workflow.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

v2v

How it works

Quick Start

Run with Docker (Recommended)

Run with Bun (Local)

Video Styles

Commands

Large Files

Services

License

About

Uh oh!

Releases

Packages

Languages

License

RedWilly/YT-Automation

Folders and files

Latest commit

History

Repository files navigation

v2v

How it works

Quick Start

Run with Docker (Recommended)

Run with Bun (Local)

Video Styles

Commands

Large Files

Services

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages