Subtitle Processing Service 字幕处理服务

Note: This README is entirely generated by AI and is for reference only.
注意：本 README 完全由 AI 生成，仅供参考。

Recent Updates

Runtime hotword settings can be toggled without restarting: /process/settings/hotword persists to config/hotword_settings.json, and Telegram 中新增 /hotword_status / /hotword_toggle 支持在线开关。
Telegram 机器人增加标签/热词交互提示、/skip 快捷命令，并在后台轮询 /process/status/<id> 自动推送字幕文件。
scripts/build-and-push.sh 新增 bgutil-provider 镜像构建；默认 Dockerfile 仅保留必需依赖，X11/VNC 相关组件以注释形式保留，构建镜像更轻量。
后端提供 /process/status/<id>?include_content=1 以及 /process/status/<id>/subtitle，方便外部查询任务进度与字幕原文。

🌍 English

Overview

A comprehensive subtitle processing service that automatically downloads, transcribes, and manages video subtitles from various platforms. Features a Telegram bot interface and a web management portal.

🚀 Features

Multi-Platform Support
- YouTube video subtitle extraction
- Bilibili video subtitle processing
- Automatic fallback to audio transcription
Subtitle Processing
- Direct subtitle download from platforms
- Audio transcription using FunASR
- Support for multiple subtitle formats (SRT, VTT, JSON3)
User Interfaces
- Telegram Bot for easy access
- Web interface for subtitle management
- Real-time subtitle viewing and searching
File Management
- Automatic file organization
- Metadata tracking
- Timeline visualization
Deployment Flexibility
- Telegram webhook via a single entrypoint, worker nodes run processing-only stack
- Build script with persistent cache/export to push and reload images quickly
- .env overrides for image tags per environment
Readwise Integration
- Automatic article creation from subtitles
- Rich text formatting support
- Seamless sync with Readwise Reader
- Smart content segmentation for long videos
Hotword Management
- Runtime toggle API (/process/settings/hotword) with persisted JSON state
- Telegram commands /hotword_status、/hotword_toggle 查看/切换自动热词
- Conversation flow supports manual hotword input或 /skip 跳过
- config/hotwords-example/ 与 config/hotword_settings.json.example 提供可定制模板

🛠️ Technical Stack

Backend: Python Flask
Frontend: HTML/CSS/JavaScript
Transcription: FunASR
Container: Docker
Storage: JSON-based file system

📦 Installation

Clone the repository
Install Docker and Docker Compose

Configure environment variables:

TELEGRAM_TOKEN=your_telegram_bot_token
READWISE_TOKEN=your_readwise_token

Configure hotword settings (optional but recommended):

cp config/hotword_settings.json.example config/hotword_settings.json
# Edit config/hotword_settings.json to set defaults for auto_hotwords/post_process/mode/max_count
# For advanced generation rules, copy config/hotwords-example/hotwords_config-example.yml to config/hotwords/hotwords_config.yml

Configure Firefox cookies for YouTube access:
- Copy your Firefox profile directory (located at C:\Users\<USER_NAME>\AppData\Roaming\Mozilla\Firefox\Profiles\) to the firefox_profile directory in the project
- This enables downloading restricted YouTube videos using your Firefox login cookies
Start the services:
```
docker-compose up --build
```

🧩 Distribute Docker Images to Multiple Hosts

Generate and push images from a build machine:

cp images.env.example images.env
# Edit images.env to set IMAGE_PREFIX (e.g. docker.io/myteam) and IMAGE_TAG
# Optionally set EXTRA_TAGS=latest if you also want a latest tag
set -a; source images.env; set +a
./scripts/build-and-push.sh

The script tags/pushes:

${IMAGE_PREFIX}/subtitle-processor:${IMAGE_TAG}
${IMAGE_PREFIX}/transcribe-audio:${IMAGE_TAG}
${IMAGE_PREFIX}/telegram-bot:${IMAGE_TAG}

On each target host, create (or edit) .env with the new image references:

SUBTITLE_PROCESSOR_IMAGE=${IMAGE_PREFIX}/subtitle-processor:${IMAGE_TAG}
TRANSCRIBE_AUDIO_IMAGE=${IMAGE_PREFIX}/transcribe-audio:${IMAGE_TAG}
TELEGRAM_BOT_IMAGE=${IMAGE_PREFIX}/telegram-bot:${IMAGE_TAG}

Pull and start containers without rebuilding locally:
```
docker compose pull
docker compose up -d --no-build
```

🤖 Telegram Deployment (Single Entry)

Choose one machine (for example the NAS that fronts Caddy) to run the telegram-bot service with webhook enabled. Configure telegram.webhook.public_url (or TELEGRAM_WEBHOOK_* envs) only on this host so it remains the sole webhook endpoint. Start the stack with the Telegram profile:
```
docker compose --profile telegram up -d
```
On additional worker machines, keep running subtitle-processor and transcribe-audio but skip the bot service. You can do this by launching only the needed services: The default profile starts only processing services, so a plain docker compose up -d works. You can also explicitly target services:
```
docker compose up -d subtitle-processor transcribe-audio
```
or comment out the telegram-bot section in the worker’s compose file. Setting TELEGRAM_BOT_ENABLED=false in the worker’s environment keeps the container in health-check mode if you ever need the image present.
The worker nodes will still take part in transcription because the primary bot forwards requests to them via the shared FunASR server list in config/config.yml.
This “single entry + multiple workers” layout prevents Telegram from redelivering the same webhook to different instances, eliminating duplicate replies in chats.
Each webhook is acknowledged immediately and the heavy lifting runs in background tasks, so Telegram never retries the same update due to timeouts.
For exceptionally long jobs you can raise the HTTP timeouts via SUBTITLE_CONNECT_TIMEOUT (default 120 seconds) and SUBTITLE_READ_TIMEOUT (default 1800 seconds). Defaults are defined in docker-compose.yml and may be overridden with environment variables if needed.

🔧 Usage

Telegram Bot
- Send video URL to the bot
- Receive processed subtitle file
Web Interface
- Access http://localhost:5000
- Upload video files or URLs
- View and search subtitles
Readwise Integration
- Automatically creates articles in Readwise Reader
- Preserves video metadata (title, URL, publish date)
- Intelligently splits long content into readable segments
- Access transcripts alongside your other reading materials

📝 License

MIT License

🙏 Acknowledgments

Special thanks to:

Windsurf - The world's first agentic IDE that made this project development possible
Claude 3.5 Sonnet - For providing comprehensive AI assistance throughout the development process

🌏 中文

概述

一个综合性的字幕处理服务，可以自动下载、转录和管理来自各种平台的视频字幕。提供 Telegram 机器人接口和网页管理门户。

🚀 功能特点

多平台支持
- YouTube 视频字幕提取
- Bilibili 视频字幕处理
- 自动音频转录备选方案
字幕处理
- 直接从平台下载字幕
- 使用 FunASR 进行音频转录
- 支持多种字幕格式（SRT、VTT、JSON3）
用户界面
- Telegram 机器人便捷访问
- 网页字幕管理界面
- 实时字幕查看和搜索
文件管理
- 自动文件组织
- 元数据跟踪
- 时间轴可视化
部署灵活性
- Telegram 仅在单一入口启用 webhook，其他节点专注处理任务
- 构建脚本带持久缓存，提高推送/本地加载效率
- 通过 .env 覆盖镜像标签，适配不同环境
Readwise 集成
- 自动从字幕创建文章
- 支持富文本格式
- 与 Readwise Reader 无缝同步
- 智能分段处理长视频内容
热词管理
- 运行期热词开关可通过 /process/settings/hotword 与 Telegram 指令在线调整
- 标签/热词会话支持手动输入或 /skip 快捷跳过
- config/hotword_settings.json.example、config/hotwords-example/ 提供自定义模板，轻松扩展自动热词策略

🛠️ 技术栈

后端：Python Flask
前端：HTML/CSS/JavaScript
转录：FunASR
容器：Docker
存储：基于 JSON 的文件系统

📦 安装步骤

克隆仓库
安装 Docker 和 Docker Compose

配置环境变量：

TELEGRAM_TOKEN=你的_telegram_机器人_token
READWISE_TOKEN=你的_readwise_token

可选：配置热词默认策略

cp config/hotword_settings.json.example config/hotword_settings.json
# 编辑热词开关/模式/最大数量等默认值
# 如需自定义生成规则，可复制 config/hotwords-example/hotwords_config-example.yml 至 config/hotwords/hotwords_config.yml

配置 Firefox cookies 以访问 YouTube：
- 将 Firefox 配置文件目录（位于 C:\Users\<USER_NAME>\AppData\Roaming\Mozilla\Firefox\Profiles\）复制到项目中的 firefox_profile 目录
- 这使您可以使用 Firefox 登录 cookie 下载受限制的 YouTube 视频
启动服务：
```
docker-compose up --build
```

🤖 Telegram 单入口部署

仅在一台机器（例如承载 Caddy 的 NAS）运行 telegram-bot 并启用 webhook，在该节点的配置文件或环境变量中填写 telegram.webhook.public_url，并使用带有 telegram profile 的启动方式：
```
docker compose --profile telegram up -d
```
其他工作节点只运行 subtitle-processor 与 transcribe-audio：默认 profile 只会启动处理服务，因此直接执行 docker compose up -d 即可；也可以显式指定服务：
```
docker compose up -d subtitle-processor transcribe-audio
```
或在它们的 docker-compose.yml 中注释掉 telegram-bot 服务；若需要保留容器，可在环境变量中设置 TELEGRAM_BOT_ENABLED=false，让其仅提供健康检查而不处理消息。
所有节点共享 config/config.yml 内的转录服务器列表，主节点收到请求后仍会委派后端 FunASR 服务执行转录。
该拓扑阻止 Telegram 将同一条 webhook 投递给多台实例，从根源上消除重复回复。
每条 Webhook 请求都会立即响应，字幕生成移至后台任务执行，Telegram 不会因超时而重试。
如果处理超长视频，可以通过环境变量 SUBTITLE_CONNECT_TIMEOUT（默认 120 秒）和 SUBTITLE_READ_TIMEOUT（默认 1800 秒）调高字幕请求的连接/读取超时。默认值写在 docker-compose.yml，需要时可在环境变量中覆盖。

🧩 多机快速分发 Docker 镜像

在构建机器上生成并推送镜像：

cp images.env.example images.env
# 编辑 images.env，设置 IMAGE_PREFIX（如 docker.io/myteam）和 IMAGE_TAG
# 如需同时推送 latest，可设置 EXTRA_TAGS=latest
set -a; source images.env; set +a
./scripts/build-and-push.sh

脚本会推送以下镜像：

${IMAGE_PREFIX}/subtitle-processor:${IMAGE_TAG}
${IMAGE_PREFIX}/transcribe-audio:${IMAGE_TAG}
${IMAGE_PREFIX}/telegram-bot:${IMAGE_TAG}

在每台目标机器根目录创建（或修改）.env 文件，填入最新镜像：

SUBTITLE_PROCESSOR_IMAGE=${IMAGE_PREFIX}/subtitle-processor:${IMAGE_TAG}
TRANSCRIBE_AUDIO_IMAGE=${IMAGE_PREFIX}/transcribe-audio:${IMAGE_TAG}
TELEGRAM_BOT_IMAGE=${IMAGE_PREFIX}/telegram-bot:${IMAGE_TAG}

拉取并启动容器，避免本地重新构建：

docker compose pull
docker compose up -d --no-build

🔧 使用方法

Telegram 机器人
- 向机器人发送视频 URL
- 接收处理好的字幕文件
网页界面
- 访问 http://localhost:5000
- 上传视频文件或 URL
- 查看和搜索字幕
Readwise 集成
- 自动在 Readwise Reader 中创建文章
- 保留视频元数据（标题、URL、发布日期）
- 智能分割长内容为易读片段
- 在其他阅读材料旁边访问转录文本

📝 许可证

MIT 许可证

🙏 致谢

特别感谢：

Windsurf - 世界首个智能代理 IDE，使本项目的开发成为可能
Claude 3.5 Sonnet - 在整个开发过程中提供全面的 AI 辅助

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
app		app
chrome-extension		chrome-extension
config		config
scripts		scripts
telegram-bot		telegram-bot
transcribe-audio		transcribe-audio
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.funasr		Dockerfile.funasr
README.md		README.md
docker-compose.yml		docker-compose.yml
images.env.example		images.env.example
quicker_action_final2.ps1		quicker_action_final2.ps1
requirements.txt		requirements.txt
run.sh		run.sh
run_app.py		run_app.py
supervisord.conf		supervisord.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Subtitle Processing Service 字幕处理服务

Recent Updates

🌍 English

Overview

🚀 Features

🛠️ Technical Stack

📦 Installation

🧩 Distribute Docker Images to Multiple Hosts

🤖 Telegram Deployment (Single Entry)

🔧 Usage

📝 License

🙏 Acknowledgments

🌏 中文

概述

最近更新

🚀 功能特点

🛠️ 技术栈

📦 安装步骤

🤖 Telegram 单入口部署

🧩 多机快速分发 Docker 镜像

🔧 使用方法

📝 许可证

🙏 致谢

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

saccohuo/subtitle-processor

Folders and files

Latest commit

History

Repository files navigation

Subtitle Processing Service 字幕处理服务

Recent Updates

🌍 English

Overview

🚀 Features

🛠️ Technical Stack

📦 Installation

🧩 Distribute Docker Images to Multiple Hosts

🤖 Telegram Deployment (Single Entry)

🔧 Usage

📝 License

🙏 Acknowledgments

🌏 中文

概述

最近更新

🚀 功能特点

🛠️ 技术栈

📦 安装步骤

🤖 Telegram 单入口部署

🧩 多机快速分发 Docker 镜像

🔧 使用方法

📝 许可证

🙏 致谢

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages