Skip to content

saccohuo/subtitle-processor

Repository files navigation

Subtitle Processing Service 字幕处理服务

English | 中文

Note: This README is entirely generated by AI and is for reference only.
注意:本 README 完全由 AI 生成,仅供参考。

Recent Updates

  • Runtime hotword settings can be toggled without restarting: /process/settings/hotword persists to config/hotword_settings.json, and Telegram 中新增 /hotword_status / /hotword_toggle 支持在线开关。
  • Telegram 机器人增加标签/热词交互提示、/skip 快捷命令,并在后台轮询 /process/status/<id> 自动推送字幕文件。
  • scripts/build-and-push.sh 新增 bgutil-provider 镜像构建;默认 Dockerfile 仅保留必需依赖,X11/VNC 相关组件以注释形式保留,构建镜像更轻量。
  • 后端提供 /process/status/<id>?include_content=1 以及 /process/status/<id>/subtitle,方便外部查询任务进度与字幕原文。

🌍 English

Overview

A comprehensive subtitle processing service that automatically downloads, transcribes, and manages video subtitles from various platforms. Features a Telegram bot interface and a web management portal.

🚀 Features

  • Multi-Platform Support

    • YouTube video subtitle extraction
    • Bilibili video subtitle processing
    • Automatic fallback to audio transcription
  • Subtitle Processing

    • Direct subtitle download from platforms
    • Audio transcription using FunASR
    • Support for multiple subtitle formats (SRT, VTT, JSON3)
  • User Interfaces

    • Telegram Bot for easy access
    • Web interface for subtitle management
    • Real-time subtitle viewing and searching
  • File Management

    • Automatic file organization
    • Metadata tracking
    • Timeline visualization
  • Deployment Flexibility

    • Telegram webhook via a single entrypoint, worker nodes run processing-only stack
    • Build script with persistent cache/export to push and reload images quickly
    • .env overrides for image tags per environment
  • Readwise Integration

    • Automatic article creation from subtitles
    • Rich text formatting support
    • Seamless sync with Readwise Reader
    • Smart content segmentation for long videos
  • Hotword Management

    • Runtime toggle API (/process/settings/hotword) with persisted JSON state
    • Telegram commands /hotword_status/hotword_toggle 查看/切换自动热词
    • Conversation flow supports manual hotword input或 /skip 跳过
    • config/hotwords-example/config/hotword_settings.json.example 提供可定制模板

🛠️ Technical Stack

  • Backend: Python Flask
  • Frontend: HTML/CSS/JavaScript
  • Transcription: FunASR
  • Container: Docker
  • Storage: JSON-based file system

📦 Installation

  1. Clone the repository
  2. Install Docker and Docker Compose
  3. Configure environment variables:
    TELEGRAM_TOKEN=your_telegram_bot_token
    READWISE_TOKEN=your_readwise_token
  4. Configure hotword settings (optional but recommended):
    cp config/hotword_settings.json.example config/hotword_settings.json
    # Edit config/hotword_settings.json to set defaults for auto_hotwords/post_process/mode/max_count
    # For advanced generation rules, copy config/hotwords-example/hotwords_config-example.yml to config/hotwords/hotwords_config.yml
  5. Configure Firefox cookies for YouTube access:
    • Copy your Firefox profile directory (located at C:\Users\<USER_NAME>\AppData\Roaming\Mozilla\Firefox\Profiles\) to the firefox_profile directory in the project
    • This enables downloading restricted YouTube videos using your Firefox login cookies
  6. Start the services:
    docker-compose up --build

🧩 Distribute Docker Images to Multiple Hosts

  1. Generate and push images from a build machine:
    cp images.env.example images.env
    # Edit images.env to set IMAGE_PREFIX (e.g. docker.io/myteam) and IMAGE_TAG
    # Optionally set EXTRA_TAGS=latest if you also want a latest tag
    set -a; source images.env; set +a
    ./scripts/build-and-push.sh
    The script tags/pushes:
    • ${IMAGE_PREFIX}/subtitle-processor:${IMAGE_TAG}
    • ${IMAGE_PREFIX}/transcribe-audio:${IMAGE_TAG}
    • ${IMAGE_PREFIX}/telegram-bot:${IMAGE_TAG}
  2. On each target host, create (or edit) .env with the new image references:
    SUBTITLE_PROCESSOR_IMAGE=${IMAGE_PREFIX}/subtitle-processor:${IMAGE_TAG}
    TRANSCRIBE_AUDIO_IMAGE=${IMAGE_PREFIX}/transcribe-audio:${IMAGE_TAG}
    TELEGRAM_BOT_IMAGE=${IMAGE_PREFIX}/telegram-bot:${IMAGE_TAG}
  3. Pull and start containers without rebuilding locally:
    docker compose pull
    docker compose up -d --no-build

🤖 Telegram Deployment (Single Entry)

  • Choose one machine (for example the NAS that fronts Caddy) to run the telegram-bot service with webhook enabled. Configure telegram.webhook.public_url (or TELEGRAM_WEBHOOK_* envs) only on this host so it remains the sole webhook endpoint. Start the stack with the Telegram profile:
    docker compose --profile telegram up -d
  • On additional worker machines, keep running subtitle-processor and transcribe-audio but skip the bot service. You can do this by launching only the needed services: The default profile starts only processing services, so a plain docker compose up -d works. You can also explicitly target services:
    docker compose up -d subtitle-processor transcribe-audio
    or comment out the telegram-bot section in the worker’s compose file. Setting TELEGRAM_BOT_ENABLED=false in the worker’s environment keeps the container in health-check mode if you ever need the image present.
  • The worker nodes will still take part in transcription because the primary bot forwards requests to them via the shared FunASR server list in config/config.yml.
  • This “single entry + multiple workers” layout prevents Telegram from redelivering the same webhook to different instances, eliminating duplicate replies in chats.
  • Each webhook is acknowledged immediately and the heavy lifting runs in background tasks, so Telegram never retries the same update due to timeouts.
  • For exceptionally long jobs you can raise the HTTP timeouts via SUBTITLE_CONNECT_TIMEOUT (default 120 seconds) and SUBTITLE_READ_TIMEOUT (default 1800 seconds). Defaults are defined in docker-compose.yml and may be overridden with environment variables if needed.

🔧 Usage

  1. Telegram Bot

    • Send video URL to the bot
    • Receive processed subtitle file
  2. Web Interface

    • Access http://localhost:5000
    • Upload video files or URLs
    • View and search subtitles
  3. Readwise Integration

    • Automatically creates articles in Readwise Reader
    • Preserves video metadata (title, URL, publish date)
    • Intelligently splits long content into readable segments
    • Access transcripts alongside your other reading materials

📝 License

MIT License

🙏 Acknowledgments

Special thanks to:

  • Windsurf - The world's first agentic IDE that made this project development possible
  • Claude 3.5 Sonnet - For providing comprehensive AI assistance throughout the development process

🌏 中文

概述

一个综合性的字幕处理服务,可以自动下载、转录和管理来自各种平台的视频字幕。提供 Telegram 机器人接口和网页管理门户。

最近更新

  • scripts/build-and-push.sh 支持持续化 BuildKit 缓存,多架构推送后会自动在本机加载当前架构镜像,无需再执行 docker pull
  • Telegram Webhook 立即返回,并将字幕处理放到后台执行,避免因为重试导致的重复回复。
  • Telegram 部署改为“单入口 + 多工作节点”模式,避免同一条消息被多个 bot 实例重复回复。
  • 文档补充镜像分发与 .env 覆盖指引,便于多机器快速上线。

🚀 功能特点

  • 多平台支持

    • YouTube 视频字幕提取
    • Bilibili 视频字幕处理
    • 自动音频转录备选方案
  • 字幕处理

    • 直接从平台下载字幕
    • 使用 FunASR 进行音频转录
    • 支持多种字幕格式(SRT、VTT、JSON3)
  • 用户界面

    • Telegram 机器人便捷访问
    • 网页字幕管理界面
    • 实时字幕查看和搜索
  • 文件管理

    • 自动文件组织
    • 元数据跟踪
    • 时间轴可视化
  • 部署灵活性

    • Telegram 仅在单一入口启用 webhook,其他节点专注处理任务
    • 构建脚本带持久缓存,提高推送/本地加载效率
    • 通过 .env 覆盖镜像标签,适配不同环境
  • Readwise 集成

    • 自动从字幕创建文章
    • 支持富文本格式
    • 与 Readwise Reader 无缝同步
    • 智能分段处理长视频内容
  • 热词管理

    • 运行期热词开关可通过 /process/settings/hotword 与 Telegram 指令在线调整
    • 标签/热词会话支持手动输入或 /skip 快捷跳过
    • config/hotword_settings.json.exampleconfig/hotwords-example/ 提供自定义模板,轻松扩展自动热词策略

🛠️ 技术栈

  • 后端:Python Flask
  • 前端:HTML/CSS/JavaScript
  • 转录:FunASR
  • 容器:Docker
  • 存储:基于 JSON 的文件系统

📦 安装步骤

  1. 克隆仓库
  2. 安装 Docker 和 Docker Compose
  3. 配置环境变量:
    TELEGRAM_TOKEN=你的_telegram_机器人_token
    READWISE_TOKEN=你的_readwise_token
  4. 可选:配置热词默认策略
    cp config/hotword_settings.json.example config/hotword_settings.json
    # 编辑热词开关/模式/最大数量等默认值
    # 如需自定义生成规则,可复制 config/hotwords-example/hotwords_config-example.yml 至 config/hotwords/hotwords_config.yml
  5. 配置 Firefox cookies 以访问 YouTube:
    • 将 Firefox 配置文件目录(位于 C:\Users\<USER_NAME>\AppData\Roaming\Mozilla\Firefox\Profiles\)复制到项目中的 firefox_profile 目录
    • 这使您可以使用 Firefox 登录 cookie 下载受限制的 YouTube 视频
  6. 启动服务:
    docker-compose up --build

🤖 Telegram 单入口部署

  • 仅在一台机器(例如承载 Caddy 的 NAS)运行 telegram-bot 并启用 webhook,在该节点的配置文件或环境变量中填写 telegram.webhook.public_url,并使用带有 telegram profile 的启动方式:
    docker compose --profile telegram up -d
  • 其他工作节点只运行 subtitle-processortranscribe-audio: 默认 profile 只会启动处理服务,因此直接执行 docker compose up -d 即可;也可以显式指定服务:
    docker compose up -d subtitle-processor transcribe-audio
  • 或在它们的 docker-compose.yml 中注释掉 telegram-bot 服务;若需要保留容器,可在环境变量中设置 TELEGRAM_BOT_ENABLED=false,让其仅提供健康检查而不处理消息。
  • 所有节点共享 config/config.yml 内的转录服务器列表,主节点收到请求后仍会委派后端 FunASR 服务执行转录。
  • 该拓扑阻止 Telegram 将同一条 webhook 投递给多台实例,从根源上消除重复回复。
  • 每条 Webhook 请求都会立即响应,字幕生成移至后台任务执行,Telegram 不会因超时而重试。
  • 如果处理超长视频,可以通过环境变量 SUBTITLE_CONNECT_TIMEOUT(默认 120 秒)和 SUBTITLE_READ_TIMEOUT(默认 1800 秒)调高字幕请求的连接/读取超时。默认值写在 docker-compose.yml,需要时可在环境变量中覆盖。

🧩 多机快速分发 Docker 镜像

  1. 在构建机器上生成并推送镜像:
    cp images.env.example images.env
    # 编辑 images.env,设置 IMAGE_PREFIX(如 docker.io/myteam)和 IMAGE_TAG
    # 如需同时推送 latest,可设置 EXTRA_TAGS=latest
    set -a; source images.env; set +a
    ./scripts/build-and-push.sh
    脚本会推送以下镜像:
    • ${IMAGE_PREFIX}/subtitle-processor:${IMAGE_TAG}
    • ${IMAGE_PREFIX}/transcribe-audio:${IMAGE_TAG}
    • ${IMAGE_PREFIX}/telegram-bot:${IMAGE_TAG}
  2. 在每台目标机器根目录创建(或修改).env 文件,填入最新镜像:
    SUBTITLE_PROCESSOR_IMAGE=${IMAGE_PREFIX}/subtitle-processor:${IMAGE_TAG}
    TRANSCRIBE_AUDIO_IMAGE=${IMAGE_PREFIX}/transcribe-audio:${IMAGE_TAG}
    TELEGRAM_BOT_IMAGE=${IMAGE_PREFIX}/telegram-bot:${IMAGE_TAG}
  3. 拉取并启动容器,避免本地重新构建:
    docker compose pull
    docker compose up -d --no-build

🔧 使用方法

  1. Telegram 机器人

    • 向机器人发送视频 URL
    • 接收处理好的字幕文件
  2. 网页界面

    • 访问 http://localhost:5000
    • 上传视频文件或 URL
    • 查看和搜索字幕
  3. Readwise 集成

    • 自动在 Readwise Reader 中创建文章
    • 保留视频元数据(标题、URL、发布日期)
    • 智能分割长内容为易读片段
    • 在其他阅读材料旁边访问转录文本

📝 许可证

MIT 许可证

🙏 致谢

特别感谢:

  • Windsurf - 世界首个智能代理 IDE,使本项目的开发成为可能
  • Claude 3.5 Sonnet - 在整个开发过程中提供全面的 AI 辅助

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •