AI News Source Extractor

Description

AI News Link Scraper extracts all URLs from the most recent AI News issue (from news.smol.ai) and prepares them for seamless import into Google's NotebookLM. It organizes sources into a dedicated folder, separates non-social URLs into a sources.txt, and generates individual markdown files for quoted tweet content.

Features

Folder Generation: Creates a timestamped folder for each issue’s sources.
sources.txt: Lists all URLs from the issue, excluding twitter.com, x.com, and discord.com.
Tweet Markdown: Saves the full text of each quoted tweet as a separate markdown file.
WebSync Ready: sources.txt can be pasted directly into the WebSync for NotebookLM Chrome extension to auto-import into NotebookLM.

Installation

git clone https://github.com/ThomsenDrake/ainews-source-extractor.git
cd ainews-source-extractor
python3 -m venv venv
source venv/bin/activate    # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Usage

Simply run the main scraper:

python build_issue.py

This will:

Generate a folder named with the current date for the latest AI News issue.
Create sources.txt inside that folder, containing all non-social URLs.
Produce individual .md files for each tweet quoted in the issue.

Roadmap

Improve URL-filtering logic to separate twitter.com, x.com, and discord.com links.
Build discord_scraper.py to fetch and save referenced Discord messages as markdown.
Parameterize the output folder path and issue source URL for greater flexibility.

Contributing

Contributions welcome! Fork, branch, and submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github		.github
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build_issue.py		build_issue.py
requirements.txt		requirements.txt
scrape_newsletter.py		scrape_newsletter.py
tweet_scraper.py		tweet_scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI News Source Extractor

Description

Features

Installation

Usage

Roadmap

Contributing

About

Uh oh!

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

AI News Source Extractor

Description

Features

Installation

Usage

Roadmap

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages