PaperCast is a project that turns any research articles into podcasts using AI generated audio. It is inspired by Illuminate https://illuminate.withgoogle.com/ and ScienceCast https://sciencecast.org/.
The author doesn't know any people working on Illuminate project nor their methods. The author is still in the waiting list for its beta release.
Aug 30th: Illuminate eventually becomes available, give it is a try https://illuminate.google.com/
| PaperCast | Illuminate | |
|---|---|---|
| Open Source | ✅ Yes | 🟡 Not yet |
| Fine-grain control | ✅ Yes | 🟡 Only arxiv links |
| Research field | ✅ Any research | 🟡 Only Computer Science |
| Audio quality | ✅ Good | ✅ Very good |
| Voice tone | ✅ Conversational | 🟡 Flat |
| Paper source | ✅ Any papers | 🟡 ArXiv only |
| Allow multiple papers | 🟡 Not yet | ✅ Yes |
| Content understanding | ✅ Good | ✅ Good |
| Computing resource | 💻 Local | ☁️ Cloud |
| Generation Limit | ✅ As many | 🟡 5 per day |
| Has Red Panda? | Yes, Justin and Emma | Only humans🧑🎓 |
- July 29th, 2024: refactorize arxiv reader and leverage its HTML render and parse to JSON + Markdown
- Jun 16th, 2024: add author interview mode, by adding "author_interview_prompt" in
prompt.yamlandadditional_questionsprovided by authors; add PDF mode so it can extract necessary information for any PDF paper frompdfsdirectory. Checkexamples/run_cognitive.yamlfor example. - Jun 15th, 2024: add subtitle
srtfile generation. Seeexamples/run_gorilla.yamlto setoffsetif any intro audio, and example video at PaperCast EP5: "Gorilla: Large Language Model Connected with Massive APIs"
To generate a podcast for "Attention is all you need", you can simply run the following command:
python run.py examples/run_attention.yamlIt should produce 1706.03762.json in the transcript directory and 1706.03762.wav in the audio directory.
Please also try a few example videos on Youtube. The play list link is at here
Setup OpenAI API key
export OPENAI_API_KEY=sk-xxxxCheck out repo and put ChatTTS in the directory
git clone https://github.com/phunterlau/papercast
cd papercast/
git clone https://github.com/2noise/ChatTTS
cd ChatTTS
pip install -r requirements.txt
cd ..
pip install -r requirements.txtPlease note that ChatTTS is still very experimental. Please refer to its repo for issues and helps.
Use examples/run_attention.yaml for example. It contains a few keys:
url: "https://arxiv.org/abs/1706.03762"
use_cache: true
episode: 3
prompt: "dialogue_prompt"
background_knowledge: |
Current year is 2024. Attention is all you need is known as the transformer paper published in 2017 by Google.
It is the foundation paper of the current large language model research.url: an Arxiv URL (abs or pdf) or a local file path of a PDF file.use_cache: if load the cached LLM-generated transcript or start over.episode: Episode number.prompt: refer toprompt.YAMLfor the podcast style, dialogue or monologue etc.- (optional)
background_knowledge: additional knowledge for better context understanding. Use "None" if not available. - (optional)
additional_questions: additional research questions for input.
I prefer the podcast in the question answering style, so the transcript must include a smooth conversation for a general overview, a few interesting questions, and the discussion onto them. The process includes 3 steps
- predicting the research field of given article
- LLM role play as a senior researcher in the research field, ask a few questions.
- Generate a podcast by addressing these questions
- The question generation is limited to the article's title and abstract only. A better tree-level question generation using the full text might bring deeper and better questions.
- It depends on ChatTTS https://github.com/2noise/ChatTTS for audio generation. The features are still very experimental and the speaker voice lottery is very tricky.
- more article readers beyond arxiv loader
- a good PDF loader to parse article meta data and sections
- Add Chinese voices
- Better question generation using full text
- Support multi-persons discussions with agentic workflow
- Support different interview modes, e.g. host vs author
This repo uses MIT License. It uses ChatTTS for audio generation and ChatTTS doesn't allow commercial use. The music in the podcast is generated by Suno.AI.
- Jina.ai has a good reader API https://jina.ai/reader/
- ChatTTS https://github.com/2noise/ChatTTS

