Automated pipeline that turns Hugging Face Daily Papers (or a single PDF) into:
- slide deck images,
- narrated audio tracks,
- subtitle-burned MP4 videos,
- optional YouTube uploads.
The repository is built for daily research-content production with minimal manual steps.
- Overview
- Pipeline
- Tech Stack
- Repository Structure
- Requirements
- Quick Start
- Usage
- CLI Reference
- Environment Variables
- Outputs
- YouTube Upload Behavior
- Figure Extraction (Optional but Built-in)
- Troubleshooting
- Publishing Checklist
- Known Limitations
- License
main.py orchestrates a full content pipeline:
- Fetch top papers from Hugging Face Daily Papers (or use
--pdf-url). - Resolve PDF links and download PDFs.
- Extract core paper text (
abstract,introduction,conclusion, plus full text). - Ask an LLM for structured slide content in JSON.
- Build Marp markdown with modern CSS and optional figure embedding.
- Render slides to PNG.
- Generate TTS audio per slide and per language.
- Compose video with subtitles using MoviePy.
- Upload to YouTube (optional).
Hugging Face Daily Papers / --pdf-url
-> PDF download
-> PDF text + figure extraction
-> LLM summarization -> slide specs
-> Marp markdown generation
-> Marp PNG rendering
-> OpenAI TTS (per slide, per language)
-> MoviePy composition + hard subtitles
-> MP4 output
-> Optional YouTube upload
- Python (pipeline orchestration)
- OpenAI API (LLM summarization, translation, TTS)
- PyMuPDF (PDF text extraction)
- Marp CLI (slide rendering)
- MoviePy + FFmpeg (video composition/encoding)
- YouTube Data API v3 (upload)
- Optional: LayoutParser + Detectron2 (figure/table extraction)
main.py # Entry point
config.py # Env-driven configuration
daily_papers/ # HF fetch + PDF handling + text extraction
llm/ # LLM client, prompts, summarizer, translator
slides/ # Figure handling, markdown builder, Marp renderer
tts/ # OpenAI TTS client
video/ # MoviePy video builder + subtitles
youtube/ # YouTube OAuth/upload
storage/ # Output path helpers
scripts/ # Setup and convenience scripts
models/publaynet/ # PubLayNet config + weights (for figure extraction)
- Python
3.10+recommended ffmpegonPATHmarpCLI onPATH- OpenAI API key
pip install -r requirements.txt# macOS example
brew install ffmpeg node
npm i -g @marp-team/marp-cli./scripts/setup_venv.shcp .env.example .envSet at least:
OPENAI_API_KEY
If uploading to YouTube, also set:
YOUTUBE_CLIENT_SECRETS_FILEYOUTUBE_TOKEN_FILE
Optional but useful to add in .env:
LANGUAGES=en,ko
TTS_SPEED=1.2
TTS_STYLE_INSTRUCTION=source scripts/env.shpython main.py --date 2026-02-21 --top-k 10 --skip-uploadpython main.py --date 2026-02-21 --top-k 10 --skip-uploadpython main.py --date 2026-02-21 --top-k 10 --video-only
# or
./scripts/run_video_only.sh 2026-02-21 10python main.py \
--pdf-url https://arxiv.org/pdf/2511.21689.pdf \
--paper-id 2511.21689 \
--paper-title "Your Paper Title" \
--origin "Your Lab/Company" \
--skip-uploadpython main.py --date 2026-02-21 --skip-render
python main.py --date 2026-02-21 --skip-tts
python main.py --date 2026-02-21 --skip-video| Flag | Description |
|---|---|
--date YYYY-MM-DD |
Target date. Defaults to today. |
--top-k N |
Number of papers to process. Defaults to TOP_K/10. |
--languages en,ko,... |
Narration languages. First language is primary output video. |
--pdf-url URL |
Process a single PDF instead of Daily Papers fetch. |
--paper-id ID |
Optional custom ID in single-PDF mode. |
--paper-title TITLE |
Optional custom title in single-PDF mode. |
--origin TEXT |
Optional affiliation/origin override for intro narration. |
--skip-render |
Skip Marp PNG rendering. |
--skip-tts |
Skip TTS generation. |
--skip-video |
Skip video composition. |
--skip-upload |
Disable YouTube upload. |
--video-only |
Build local video and skip upload (same effect as --skip-upload for upload phase). |
| Variable | Default | Description |
|---|---|---|
HF_BASE_URL |
https://huggingface.co |
Hugging Face base URL. |
TOP_K |
10 |
Default top-K paper count. |
OPENAI_API_KEY |
- | Required for LLM/TTS steps. |
OPENAI_LLM_MODEL |
gpt-5.3 |
Model used for summarization and translation. |
OPENAI_TTS_MODEL |
gpt-4o-mini-tts |
TTS model. |
OPENAI_TTS_VOICE |
alloy |
Voice preset. |
TTS_STYLE_INSTRUCTION |
empty | Style hint string (stored, currently not injected into spoken text). |
TTS_SPEED |
1.2 |
TTS speed multiplier, clamped to [0.5, 4.0]. |
LANGUAGES |
en,ko |
Comma-separated target narration languages. |
YOUTUBE_CLIENT_SECRETS_FILE |
empty | OAuth client JSON path. |
YOUTUBE_TOKEN_FILE |
empty | OAuth token cache path. |
OUTPUT_BASE_DIR |
./outputs |
Base output directory. |
LOG_LEVEL |
INFO |
Python logging level. |
outputs/{date}/
daily_papers_{date}.mp4
daily_papers_{date}_{lang}.mp4
slides/
slides_{date}.md
slides_{date}_*.png
scripts_{date}_{lang}.txt
{paper_id}/
paper.pdf
captions.json
figures/*.png
audio/{lang}/
audio_slide_001.mp3
...
outputs/{date}/{paper_id}/
paper.pdf
captions.json
figures/*.png
slides/
slides_{date}.md
slides_{date}_*.png
figures/*.png # copied assets used in slide markdown
scripts_{date}_{lang}.txt
audio/{lang}/audio_slide_*.mp3
daily_papers_{date}.mp4
daily_papers_{date}_{lang}.mp4
- Upload occurs only when both conditions are met:
- you did not pass
--skip-upload/--video-only - both
YOUTUBE_CLIENT_SECRETS_FILEandYOUTUBE_TOKEN_FILEare configured
- you did not pass
- First OAuth run opens a local browser for consent.
- Videos are uploaded as:
privacyStatus: unlistedcategoryId: 28(Science & Technology)
- In multi-language output, title suffix includes language labels (for non-primary tracks).
Figure extraction runs per downloaded PDF before summarization.
Behavior:
- If
captions.jsonalready exists, extraction is skipped (cache behavior). - If model files or optional deps are missing, extraction is skipped gracefully.
- Extracted figure/table metadata is fed into the LLM prompt to improve slide quality.
Model/env paths:
- Default model files:
models/publaynet/config.yamlmodels/publaynet/model_final.pth
- Override via:
PUBLayNET_CONFIGPUBLayNET_WEIGHTS
Optional dependencies for this feature are not in requirements.txt by default (e.g., layoutparser, detectron2, opencv-python, numpy).
Install Marp CLI globally and verify:
npm i -g @marp-team/marp-cli
marp --versionInstall and verify:
ffmpeg -versionCheck:
OPENAI_API_KEYis set correctly- model names are valid for your account
- quota/rate limits are not exhausted
Common causes:
- slide image count and audio file count mismatch
--skip-renderor--skip-ttsused in ways that leave missing artifacts- upstream LLM/TTS failures in logs
Check:
- OAuth client JSON path is valid
- token path is writable
- if token is stale, delete token JSON and authenticate again
Before pushing to GitHub:
- Remove or ignore secrets:
.env- OAuth token files
- OAuth client secrets
- Avoid committing large generated artifacts:
outputs/- local MP4/PNG/MP3 files
- Add a proper
.gitignoreif not present. - Keep
LICENSEandNOTICEfiles when redistributing.
- No automated test suite is included yet.
- LLM output quality depends heavily on model behavior and prompt adherence.
- TTS is generated slide-by-slide; long runs can be costly/time-consuming.
- Figure extraction quality depends on external model/deps and PDF layout quality.
This project is licensed under the Apache License 2.0. See LICENSE.
If you redistribute this project or derivative works, you should:
- include the
LICENSEfile, - retain the
NOTICEfile, - keep attribution to the original project/author in source or documentation.