A modular Python system for capturing, summarizing, and delivering web content in a frictionless "read-later" pipeline. Designed to support your personal knowledge workflows, bucket connects Discord, RSS feeds, and local databases to generate clean, readable briefingsβultimately served to a reMarkable tablet or accessed via API on demand.
- Web Content Capture: Fetch and clean articles from URLs or RSS feeds
- AI Summarization: Generate concise summaries using Ollama or OpenAI
- PDF Briefings: Create beautiful, formatted PDF reports for offline reading
- Discord Integration: Add articles via Discord bot commands
- REST API: Full API for automation and integration
- Obsidian Export: Sync articles to Obsidian vault with Johnny.Decimal schema
- CLI Interface: Easy command-line control with rich output
- Scheduled Tasks: Automatic RSS fetching and briefing generation
- reMarkable Ready: PDFs optimized for tablet reading
# Clone the repository
git clone https://github.com/yourusername/bucket.git
cd bucket
# Install dependencies
pip install -e .
# Initialize the system
bucket init# Add a URL to your bucket
bucket add "https://example.com/article"
# Add an RSS feed
bucket feed "Tech News" "https://example.com/feed.xml" --tag tech
# Generate a daily briefing
bucket briefing --title "Morning Briefing" --days 7
# Start the API server
bucket serve --port 8000
# Run the full system
bucket run --discord YOUR_TOKEN --obsidian /path/to/vaultFor Hugo site integration (RSS to read_later reports):
# Auto-detect Hugo site (if running from hugo directory or parent)
python -m bucket.cli serve
# Specify Hugo site path explicitly
python -m bucket.cli --hugo-site /path/to/hugo/site serve
# Process RSS feeds and generate daily reports
python -m bucket.cli process --max-articles 5 --build
# Build Hugo site only
python -m bucket.cli build
# Show current configuration
python -m bucket.cli configBucket uses environment variables for configuration with smart defaults:
| Variable | Default | Description |
|---|---|---|
BUCKET_DB_PATH |
bucket.db |
Database file path |
BUCKET_API_HOST |
0.0.0.0 |
API server host |
BUCKET_API_PORT |
8000 |
API server port |
BUCKET_HUGO_SITE_PATH |
Auto-detect | Path to Hugo site |
BUCKET_OUTPUT_DIR |
output |
Output directory |
Bucket automatically detects Hugo sites by looking for:
config.toml,hugo.toml,config.yaml, orconfig.ymlin current directory- Common subdirectories:
blog,site,hugo,spillyourgutsonline-blog - Parent directory and its subdirectories
# Environment variables
export BUCKET_HUGO_SITE_PATH=/home/user/my-blog
export BUCKET_DB_PATH=/home/user/bucket.db
# Command line options
python -m bucket.cli --hugo-site /path/to/blog --db-path ./data.db serve
# Copy example config
cp env.example .env
# Edit .env with your paths- Python 3.9+
- SQLite (included)
- Ollama (for local LLM summarization) or OpenAI API key
- Discord bot token (optional)
- Obsidian vault (optional)
bucket/
βββ core.py # Main orchestrator
βββ models.py # Data models
βββ database.py # Database management
βββ fetcher.py # Web content fetching
βββ summarizer.py # AI summarization
βββ pdf_generator.py # PDF generation
βββ discord_bot.py # Discord integration
βββ api.py # REST API
βββ cli.py # Command-line interface
Create a .env file:
# Discord Bot (optional)
DISCORD_TOKEN=your_discord_bot_token
# OpenAI (optional, for summarization)
OPENAI_API_KEY=your_openai_api_key
# Database
BUCKET_DB_PATH=bucket.db
# Output
BUCKET_OUTPUT_DIR=output
# Obsidian
OBSIDIAN_VAULT_PATH=/path/to/obsidian/vault- Create a Discord application at https://discord.com/developers/applications
- Create a bot and copy the token
- Invite the bot to your server with appropriate permissions
- Use the token in your
.envfile or--discordflag
For local LLM summarization:
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull a model
ollama pull llama2
# Start Ollama service
ollama serve# Add a single article
bucket add "https://example.com/article" --priority high --tag tech
# Add an RSS feed
bucket feed "Hacker News" "https://news.ycombinator.com/rss" --tag news
# Fetch from all RSS feeds
bucket fetch# Daily briefing
bucket briefing --title "Daily Briefing" --days 7
# Filtered briefing
bucket briefing --tag tech --priority high --title "Tech Briefing"
# Custom output
bucket briefing --output ./briefings --title "Weekly Summary"# Start API server
bucket serve --port 8000
# Add article via API
curl -X POST "http://localhost:8000/articles" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/article", "priority": "high"}'
# Generate briefing via API
curl -X POST "http://localhost:8000/briefings/generate" \
-H "Content-Type: application/json" \
-d '{"title": "Daily Briefing", "days_back": 7}'!add https://example.com/article
!feeds add "Tech News" https://example.com/feed.xml
!rss refresh
!rss briefing 7
!brief 7 discord
!brief 7 pdf
!status
!help
Available Commands:
!add <url>- Add an article or webpage to your reading bucket!feeds [add|remove|toggle|list]- Unified RSS feed management!feeds add "name" url- Add an RSS feed!feeds remove <id>- Remove a feed by ID!feeds toggle <id>- Enable/disable a feed!feeds list- List all feeds (default)
!rss [show|refresh|briefing|stats]- Unified RSS operations!rssor!rss show 3- Show recent unseen RSS items!rss refresh- Update all RSS feeds!rss briefing 7- Generate comprehensive RSS briefing!rss stats- Show RSS feed statistics
!brief [days] [format]- Generate a quick briefing of recent articles and RSS feeds- Formats:
discord(embed),pdf(downloadable PDF) - Usage:
!brief 7 discord(default: 7 days, discord format)
- Formats:
!status- Show current bucket system status!help- Show detailed help information
GET /- API statusGET /health- Health checkPOST /articles- Add articleGET /articles- List articlesGET /articles/{id}- Get articlePOST /feeds- Add RSS feedGET /feeds- List feedsPOST /briefings/generate- Generate briefingGET /briefings- List briefingsGET /briefings/{filename}- Download briefingGET /stats- System statistics
The system uses SQLite with the following tables:
articles- Article content and metadatasummaries- AI-generated summariesfeeds- RSS feed configurationsdeliveries- Delivery tracking
from bucket.summarizer import SummarizerFactory
# Use OpenAI
summarizer = SummarizerFactory.create_summarizer(
summarizer_type="openai",
api_key="your_key",
model_name="gpt-3.5-turbo"
)
# Use custom Ollama model
summarizer = SummarizerFactory.create_summarizer(
summarizer_type="ollama",
model_name="codellama",
base_url="http://localhost:11434"
)# Export articles to Obsidian
bucket run --obsidian /path/to/vault
# Articles are organized with Johnny.Decimal schema:
# 10.00/20231201_Article_Title.mdThe system automatically:
- Fetches RSS feeds every 4 hours
- Generates daily briefings at 8 AM
- Summarizes pending articles every hour
# Install development dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Format code
black bucket/
isort bucket/
# Type checking
mypy bucket/- Create feature branch
- Add tests in
tests/ - Update documentation
- Submit pull request
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
MIT License - see LICENSE file for details.
- Ollama for local LLM inference
- FastAPI for the REST API
- WeasyPrint for PDF generation
- Typer for the CLI
- Rich for beautiful terminal output
Discord bot not responding
- Check bot token and permissions
- Ensure bot is invited to server
Ollama connection failed
- Verify Ollama is running:
ollama serve - Check model is installed:
ollama list
PDF generation fails
- Install system dependencies for WeasyPrint
- Check output directory permissions
Database errors
- Delete
bucket.dband reinitialize - Check SQLite installation
- Check the Issues page
- Create a new issue with detailed information
- Join our Discord server for support
Bucket - Because knowledge should flow like water, not pile up like clutter. πͺ£