Note: This README is entirely generated by AI and is for reference only.
注意:本 README 完全由 AI 生成,仅供参考。
- Runtime hotword settings can be toggled without restarting:
/process/settings/hotwordpersists toconfig/hotword_settings.json, and Telegram 中新增/hotword_status//hotword_toggle支持在线开关。 - Telegram 机器人增加标签/热词交互提示、
/skip快捷命令,并在后台轮询/process/status/<id>自动推送字幕文件。 scripts/build-and-push.sh新增bgutil-provider镜像构建;默认 Dockerfile 仅保留必需依赖,X11/VNC 相关组件以注释形式保留,构建镜像更轻量。- 后端提供
/process/status/<id>?include_content=1以及/process/status/<id>/subtitle,方便外部查询任务进度与字幕原文。
A comprehensive subtitle processing service that automatically downloads, transcribes, and manages video subtitles from various platforms. Features a Telegram bot interface and a web management portal.
-
Multi-Platform Support
- YouTube video subtitle extraction
- Bilibili video subtitle processing
- Automatic fallback to audio transcription
-
Subtitle Processing
- Direct subtitle download from platforms
- Audio transcription using FunASR
- Support for multiple subtitle formats (SRT, VTT, JSON3)
-
User Interfaces
- Telegram Bot for easy access
- Web interface for subtitle management
- Real-time subtitle viewing and searching
-
File Management
- Automatic file organization
- Metadata tracking
- Timeline visualization
-
Deployment Flexibility
- Telegram webhook via a single entrypoint, worker nodes run processing-only stack
- Build script with persistent cache/export to push and reload images quickly
.envoverrides for image tags per environment
-
Readwise Integration
- Automatic article creation from subtitles
- Rich text formatting support
- Seamless sync with Readwise Reader
- Smart content segmentation for long videos
-
Hotword Management
- Runtime toggle API (
/process/settings/hotword) with persisted JSON state - Telegram commands
/hotword_status、/hotword_toggle查看/切换自动热词 - Conversation flow supports manual hotword input或
/skip跳过 config/hotwords-example/与config/hotword_settings.json.example提供可定制模板
- Runtime toggle API (
- Backend: Python Flask
- Frontend: HTML/CSS/JavaScript
- Transcription: FunASR
- Container: Docker
- Storage: JSON-based file system
- Clone the repository
- Install Docker and Docker Compose
- Configure environment variables:
TELEGRAM_TOKEN=your_telegram_bot_token READWISE_TOKEN=your_readwise_token
- Configure hotword settings (optional but recommended):
cp config/hotword_settings.json.example config/hotword_settings.json # Edit config/hotword_settings.json to set defaults for auto_hotwords/post_process/mode/max_count # For advanced generation rules, copy config/hotwords-example/hotwords_config-example.yml to config/hotwords/hotwords_config.yml
- Configure Firefox cookies for YouTube access:
- Copy your Firefox profile directory (located at
C:\Users\<USER_NAME>\AppData\Roaming\Mozilla\Firefox\Profiles\) to thefirefox_profiledirectory in the project - This enables downloading restricted YouTube videos using your Firefox login cookies
- Copy your Firefox profile directory (located at
- Start the services:
docker-compose up --build
- Generate and push images from a build machine:
The script tags/pushes:
cp images.env.example images.env # Edit images.env to set IMAGE_PREFIX (e.g. docker.io/myteam) and IMAGE_TAG # Optionally set EXTRA_TAGS=latest if you also want a latest tag set -a; source images.env; set +a ./scripts/build-and-push.sh
${IMAGE_PREFIX}/subtitle-processor:${IMAGE_TAG}${IMAGE_PREFIX}/transcribe-audio:${IMAGE_TAG}${IMAGE_PREFIX}/telegram-bot:${IMAGE_TAG}
- On each target host, create (or edit)
.envwith the new image references:SUBTITLE_PROCESSOR_IMAGE=${IMAGE_PREFIX}/subtitle-processor:${IMAGE_TAG} TRANSCRIBE_AUDIO_IMAGE=${IMAGE_PREFIX}/transcribe-audio:${IMAGE_TAG} TELEGRAM_BOT_IMAGE=${IMAGE_PREFIX}/telegram-bot:${IMAGE_TAG}
- Pull and start containers without rebuilding locally:
docker compose pull docker compose up -d --no-build
- Choose one machine (for example the NAS that fronts Caddy) to run the
telegram-botservice with webhook enabled. Configuretelegram.webhook.public_url(orTELEGRAM_WEBHOOK_*envs) only on this host so it remains the sole webhook endpoint. Start the stack with the Telegram profile:docker compose --profile telegram up -d
- On additional worker machines, keep running
subtitle-processorandtranscribe-audiobut skip the bot service. You can do this by launching only the needed services: The default profile starts only processing services, so a plaindocker compose up -dworks. You can also explicitly target services:or comment out thedocker compose up -d subtitle-processor transcribe-audio
telegram-botsection in the worker’s compose file. SettingTELEGRAM_BOT_ENABLED=falsein the worker’s environment keeps the container in health-check mode if you ever need the image present. - The worker nodes will still take part in transcription because the primary bot forwards requests to them via the shared FunASR server list in
config/config.yml. - This “single entry + multiple workers” layout prevents Telegram from redelivering the same webhook to different instances, eliminating duplicate replies in chats.
- Each webhook is acknowledged immediately and the heavy lifting runs in background tasks, so Telegram never retries the same update due to timeouts.
- For exceptionally long jobs you can raise the HTTP timeouts via
SUBTITLE_CONNECT_TIMEOUT(default 120 seconds) andSUBTITLE_READ_TIMEOUT(default 1800 seconds). Defaults are defined indocker-compose.ymland may be overridden with environment variables if needed.
-
Telegram Bot
- Send video URL to the bot
- Receive processed subtitle file
-
Web Interface
- Access
http://localhost:5000 - Upload video files or URLs
- View and search subtitles
- Access
-
Readwise Integration
- Automatically creates articles in Readwise Reader
- Preserves video metadata (title, URL, publish date)
- Intelligently splits long content into readable segments
- Access transcripts alongside your other reading materials
MIT License
Special thanks to:
- Windsurf - The world's first agentic IDE that made this project development possible
- Claude 3.5 Sonnet - For providing comprehensive AI assistance throughout the development process
一个综合性的字幕处理服务,可以自动下载、转录和管理来自各种平台的视频字幕。提供 Telegram 机器人接口和网页管理门户。
scripts/build-and-push.sh支持持续化 BuildKit 缓存,多架构推送后会自动在本机加载当前架构镜像,无需再执行docker pull。- Telegram Webhook 立即返回,并将字幕处理放到后台执行,避免因为重试导致的重复回复。
- Telegram 部署改为“单入口 + 多工作节点”模式,避免同一条消息被多个 bot 实例重复回复。
- 文档补充镜像分发与
.env覆盖指引,便于多机器快速上线。
-
多平台支持
- YouTube 视频字幕提取
- Bilibili 视频字幕处理
- 自动音频转录备选方案
-
字幕处理
- 直接从平台下载字幕
- 使用 FunASR 进行音频转录
- 支持多种字幕格式(SRT、VTT、JSON3)
-
用户界面
- Telegram 机器人便捷访问
- 网页字幕管理界面
- 实时字幕查看和搜索
-
文件管理
- 自动文件组织
- 元数据跟踪
- 时间轴可视化
-
部署灵活性
- Telegram 仅在单一入口启用 webhook,其他节点专注处理任务
- 构建脚本带持久缓存,提高推送/本地加载效率
- 通过
.env覆盖镜像标签,适配不同环境
-
Readwise 集成
- 自动从字幕创建文章
- 支持富文本格式
- 与 Readwise Reader 无缝同步
- 智能分段处理长视频内容
-
热词管理
- 运行期热词开关可通过
/process/settings/hotword与 Telegram 指令在线调整 - 标签/热词会话支持手动输入或
/skip快捷跳过 config/hotword_settings.json.example、config/hotwords-example/提供自定义模板,轻松扩展自动热词策略
- 运行期热词开关可通过
- 后端:Python Flask
- 前端:HTML/CSS/JavaScript
- 转录:FunASR
- 容器:Docker
- 存储:基于 JSON 的文件系统
- 克隆仓库
- 安装 Docker 和 Docker Compose
- 配置环境变量:
TELEGRAM_TOKEN=你的_telegram_机器人_token READWISE_TOKEN=你的_readwise_token
- 可选:配置热词默认策略
cp config/hotword_settings.json.example config/hotword_settings.json # 编辑热词开关/模式/最大数量等默认值 # 如需自定义生成规则,可复制 config/hotwords-example/hotwords_config-example.yml 至 config/hotwords/hotwords_config.yml
- 配置 Firefox cookies 以访问 YouTube:
- 将 Firefox 配置文件目录(位于
C:\Users\<USER_NAME>\AppData\Roaming\Mozilla\Firefox\Profiles\)复制到项目中的firefox_profile目录 - 这使您可以使用 Firefox 登录 cookie 下载受限制的 YouTube 视频
- 将 Firefox 配置文件目录(位于
- 启动服务:
docker-compose up --build
- 仅在一台机器(例如承载 Caddy 的 NAS)运行
telegram-bot并启用 webhook,在该节点的配置文件或环境变量中填写telegram.webhook.public_url,并使用带有telegramprofile 的启动方式:docker compose --profile telegram up -d
- 其他工作节点只运行
subtitle-processor与transcribe-audio: 默认 profile 只会启动处理服务,因此直接执行docker compose up -d即可;也可以显式指定服务:docker compose up -d subtitle-processor transcribe-audio
- 或在它们的
docker-compose.yml中注释掉telegram-bot服务;若需要保留容器,可在环境变量中设置TELEGRAM_BOT_ENABLED=false,让其仅提供健康检查而不处理消息。 - 所有节点共享
config/config.yml内的转录服务器列表,主节点收到请求后仍会委派后端 FunASR 服务执行转录。 - 该拓扑阻止 Telegram 将同一条 webhook 投递给多台实例,从根源上消除重复回复。
- 每条 Webhook 请求都会立即响应,字幕生成移至后台任务执行,Telegram 不会因超时而重试。
- 如果处理超长视频,可以通过环境变量
SUBTITLE_CONNECT_TIMEOUT(默认 120 秒)和SUBTITLE_READ_TIMEOUT(默认 1800 秒)调高字幕请求的连接/读取超时。默认值写在docker-compose.yml,需要时可在环境变量中覆盖。
- 在构建机器上生成并推送镜像:
脚本会推送以下镜像:
cp images.env.example images.env # 编辑 images.env,设置 IMAGE_PREFIX(如 docker.io/myteam)和 IMAGE_TAG # 如需同时推送 latest,可设置 EXTRA_TAGS=latest set -a; source images.env; set +a ./scripts/build-and-push.sh
${IMAGE_PREFIX}/subtitle-processor:${IMAGE_TAG}${IMAGE_PREFIX}/transcribe-audio:${IMAGE_TAG}${IMAGE_PREFIX}/telegram-bot:${IMAGE_TAG}
- 在每台目标机器根目录创建(或修改)
.env文件,填入最新镜像:SUBTITLE_PROCESSOR_IMAGE=${IMAGE_PREFIX}/subtitle-processor:${IMAGE_TAG} TRANSCRIBE_AUDIO_IMAGE=${IMAGE_PREFIX}/transcribe-audio:${IMAGE_TAG} TELEGRAM_BOT_IMAGE=${IMAGE_PREFIX}/telegram-bot:${IMAGE_TAG}
- 拉取并启动容器,避免本地重新构建:
docker compose pull docker compose up -d --no-build
-
Telegram 机器人
- 向机器人发送视频 URL
- 接收处理好的字幕文件
-
网页界面
- 访问
http://localhost:5000 - 上传视频文件或 URL
- 查看和搜索字幕
- 访问
-
Readwise 集成
- 自动在 Readwise Reader 中创建文章
- 保留视频元数据(标题、URL、发布日期)
- 智能分割长内容为易读片段
- 在其他阅读材料旁边访问转录文本
MIT 许可证
特别感谢:
- Windsurf - 世界首个智能代理 IDE,使本项目的开发成为可能
- Claude 3.5 Sonnet - 在整个开发过程中提供全面的 AI 辅助