AI News Link Scraper extracts all URLs from the most recent AI News issue (from news.smol.ai) and prepares them for seamless import into Google's NotebookLM. It organizes sources into a dedicated folder, separates non-social URLs into a sources.txt, and generates individual markdown files for quoted tweet content.
- Folder Generation: Creates a timestamped folder for each issue’s sources.
- sources.txt: Lists all URLs from the issue, excluding
twitter.com,x.com, anddiscord.com. - Tweet Markdown: Saves the full text of each quoted tweet as a separate markdown file.
- WebSync Ready:
sources.txtcan be pasted directly into the WebSync for NotebookLM Chrome extension to auto-import into NotebookLM.
git clone https://github.com/ThomsenDrake/ainews-source-extractor.git
cd ainews-source-extractor
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txtSimply run the main scraper:
python build_issue.pyThis will:
- Generate a folder named with the current date for the latest AI News issue.
- Create
sources.txtinside that folder, containing all non-social URLs. - Produce individual
.mdfiles for each tweet quoted in the issue.
- Improve URL-filtering logic to separate
twitter.com,x.com, anddiscord.comlinks. - Build
discord_scraper.pyto fetch and save referenced Discord messages as markdown. - Parameterize the output folder path and issue source URL for greater flexibility.
Contributions welcome! Fork, branch, and submit a pull request.