Video-Skill-Transcriber 🧠

The cure for your "Watch Later" backlog. Let AI binge-watch those thousands of saved videos for you, turning them into summaries and knowledge.

中文说明 (Chinese README)

📖 Table of Contents

The Problem: Information Overload
Features
Installation
Usage
Bilibili Workflow
For AI Agents (Skills)

The Problem: Information Overload

Have you ever looked at your YouTube "Watch Later" or Bilibili "Favorites" list and felt anxiety?

You've saved thousands of high-quality tutorials, lectures, and talks, thinking "I'll learn this later." But "later" never comes because watching video is time-consuming.

Video-Skill-Transcriber is the solution. It autonomously batches download and transcribes your backlog, converting hours of video into structured text that AI can digest in seconds.

Turn "Watch Later" into "Knowledge Acquired".

Features

Feature	Description	Note
Universal Download	Supports YouTube, Bilibili, TikTok, etc.	Powered by `yt-dlp`
Video Understanding	Gemini 1.5 Pro/Flash reads video directly	New (Requires Key)
Multi-Engine ASR	Whisper (Local), Qwen3 (Chinese Optimized), OpenAI API	Offline & Online support
API Server	FastAPI interface for remote calls	New
Batch Pipeline	Auto-fetch "Watch Later" -> Download -> Transcribe	Core Feature
Privacy First	Credentials and Inference run 100% Locally	Safe for private lists
Agent Ready	Standardized Skill Definition for Claude/GPT	Automate the process

Installation

Method 1: Standalone Usage (Recommended)

Clone or Download ZIP:

git clone https://github.com/JackMeds/Video-Skill-Transcriber.git
# Or download ZIP from Release page
cd Video-Skill-Transcriber

Install dependencies:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

(Requires FFmpeg installed)

Update: Run the self-update tool (works for both Git and ZIP installs):
```
python -m tools.update_skill
```

Method 2: Install to Agent (e.g., OpenClaw)

To integrate this skill into an existing Agent environment:

python install.py --target /path/to/.agent/skills

This creates a symlink, ensuring your Agent always uses the latest code.

(Optional) Configure API: Copy .env.example to .env if you want to use Online Transcription.

Usage

1. General Download

python -m tools.download "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

2. Transcribe / Video Understanding

# Local Whisper (Default)
python -m tools.transcribe "output/video.m4a"

# Local Qwen3-ASR (Best for Chinese)
python -m tools.transcribe "output/video.m4a" -m Qwen/Qwen3-ASR-0.6B

# Multimodal AI (Gemini 1.5) - Reads video directly
python -m tools.transcribe "output/video.mp4" -m gemini

# Online API (Fastest)
python -m tools.transcribe "output/video.m4a" -m openai

3. Start API Server

Allow remote Agents to use these tools via HTTP:

python -m tools.api_server
# Docs: http://localhost:8000/docs

Bilibili Workflow

We support both Public and Authenticated modes.

Mode 1: Public Access (Default)

For standard public videos, no login is required. Just use the download tool directly.

python -m tools.download "https://www.bilibili.com/video/BVxxx"

Mode 2: Authenticated (Advanced)

Login is required ONLY if you want to:

Access your private "Watch Later" or "Favorites" lists.
Download 1080P+ / Premium quality videos.

Steps:

Login via QR Code:
```
python -m tools.auth
```
(Session is saved locally to .user_session.json)

Process Backlog: Once logged in, you can fetch your private lists:

# 1. Fetch Top 10 from Watch Later
python -m tools.list --watch-later --limit 10

# 2. Run the pipeline
python -m tools.batch_run

For AI Agents (Skills)

Give skills/VIDEO_SKILL.md to your AI Agent (Claude/ChatGPT). It will learn to use these tools autonomously.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
assets		assets
output		output
skills		skills
tools		tools
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
README_zh-CN.md		README_zh-CN.md
batch_list.json		batch_list.json
install.py		install.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video-Skill-Transcriber 🧠

📖 Table of Contents

The Problem: Information Overload

Features

Installation

Method 1: Standalone Usage (Recommended)

Method 2: Install to Agent (e.g., OpenClaw)

Usage

1. General Download

2. Transcribe / Video Understanding

3. Start API Server

Bilibili Workflow

Mode 1: Public Access (Default)

Mode 2: Authenticated (Advanced)

For AI Agents (Skills)

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Video-Skill-Transcriber 🧠

📖 Table of Contents

The Problem: Information Overload

Features

Installation

Method 1: Standalone Usage (Recommended)

Method 2: Install to Agent (e.g., OpenClaw)

Usage

1. General Download

2. Transcribe / Video Understanding

3. Start API Server

Bilibili Workflow

Mode 1: Public Access (Default)

Mode 2: Authenticated (Advanced)

For AI Agents (Skills)

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages