An end-to-end RSS article summarization and publishing system that continuously monitors RSS feeds, intelligently summarizes articles using AI, and publishes summaries to Discord channels with comprehensive monitoring and observability.
Information Broker solves the challenge of staying informed in today's information-heavy world by automatically processing multiple RSS feeds, generating concise AI-powered summaries, and delivering them directly to your team's Discord channels. The system guarantees reliable, sequential processing with built-in error handling, metrics collection, and real-time monitoring to ensure no important content is missed while preventing API rate limiting and system overload.
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ RSS Feeds │ │ RSS Monitor │ │ Summarization │
│ (External) │───▶│ - Feed Scraper │───▶│ Scheduler │
│ │ │ - Deduplication │ │ - Single Worker │
└─────────────────┘ │ - Content Hash │ │ - Thread-Safe │
└──────────────────┘ │ - Queue Manager │
└─────────────────┘
│
┌─────────────────┐ ┌──────────────────┐ │
│ Discord │ │ AI Summary │ │
│ Webhook │◀───│ (Ollama LLM) │◀────────────┘
│ │ │ - Local API │
└─────────────────┘ │ │
└──────────────────┘
│
┌─────────────────┐ ┌──────────────────┐
│ PostgreSQL │ │ Prometheus │
│ Database │ │ Metrics │
│ - Articles │ │ - Pipeline │
│ - Summaries │◀───│ - Latency │
│ - Logs │ │ - Errors │
│ - Webhooks │ │ - Queue Depth │
└─────────────────┘ └──────────────────┘
│
┌──────────────────┐
│ Grafana │
│ Dashboards │
│ - Health │
│ - Performance │
│ - Alerts │
└──────────────────┘
- RSS Monitor: Continuously scrapes configured RSS feeds, performs content deduplication using SHA-256 hashing
- Summarization Scheduler: Thread-safe single-worker queue ensuring only one AI request is in-flight at any time
- Ollama Integration: Local LLM API for article summarization with configurable models and retry logic
- Discord Publisher: Webhook-based notification system with formatted summaries and article links
- PostgreSQL Database: Persistent storage for articles, summaries, logs, and operational metrics
- Prometheus Metrics: Real-time monitoring of all pipeline stages with custom metrics
- Grafana Dashboards: Pre-configured visualizations for system health and performance analysis
- Sequential Summarization: Single-worker architecture prevents API rate limiting and ensures predictable load
- Centralized Queueing: Thread-safe queue management with priority handling and backpressure protection
- Robust Error Handling: Circuit breakers, exponential backoff, and comprehensive retry logic
- Real-time Metrics: Prometheus integration tracking feed processing, summarization success/failure, queue depth, and latency
- Complete Observability: Pre-built Grafana dashboards for monitoring errors, throughput, and system health
- Zero-Configuration Setup: Fully containerized with Docker Compose and environment-based configuration
- Content Deduplication: SHA-256 content hashing prevents duplicate article processing
- Health Monitoring: Built-in health checks for all services with dependency validation
- Graceful Shutdown: Proper signal handling and resource cleanup on termination
- Article Filtering: Configurable initiation date to ignore articles published before system start
- Chronological Processing: Articles are processed in chronological order to maintain temporal consistency
- Docker: Version 20.10+ with Compose V2 support
- Docker Compose: V2.0+ (usually bundled with Docker)
- Git: For cloning the repository
- Ollama Server: Local or remote instance with desired model (e.g., llama3)
-
Clone the Repository
git clone https://github.com/PureCypher/Information-Broker cd information-broker -
Generate Secure Passwords
# Generate secure passwords for database and Grafana ./scripts/generate-passwords.shThis script will:
- Create a backup of your existing
.envfile - Generate secure random passwords for database and Grafana
- Create a 32-character secret key for Grafana
- Update your
.envfile with the new credentials - Display the generated credentials for your records
- Create a backup of your existing
-
Essential Configuration After password generation, update these critical values in
.env:# Set your Ollama server URL and model OLLAMA_URL=http://your-ollama-server:11434 OLLAMA_MODEL=llama3 # Configure Discord webhook (optional but recommended) DISCORD_WEBHOOK_URL=https://discord.com/api/webhooks/your/webhook/url # Passwords are already generated - no manual entry needed! # DB_PASSWORD=<automatically_generated> # GRAFANA_ADMIN_PASSWORD=<automatically_generated> # GRAFANA_SECRET_KEY=<automatically_generated>
-
Start the System
# Start all services docker compose up -d # Or use the Makefile for convenience make docker-run
-
Verify Deployment
# Check service health make status # View logs make logs
After successful deployment, access these services:
| Service | URL | Purpose |
|---|---|---|
| Application API | http://localhost:8080 | Health checks, statistics, manual triggers |
| Health Check | http://localhost:8080/health | Service health status |
| Prometheus | http://localhost:9090 | Metrics collection and querying |
| Grafana | http://localhost:3001 | Dashboards and visualization (admin/admin) |
| PostgreSQL | localhost:5432 | Database access (postgres/postgres) |
Information Broker includes automated scripts for generating secure passwords and secret keys:
# Generate all required passwords and keys
./scripts/generate-passwords.shThis script generates:
- Database Password: 32-character secure password for PostgreSQL
- Grafana Admin Password: 24-character password for Grafana admin login
- Grafana Secret Key: 32-character alphanumeric key for internal encryption
If you prefer to set passwords manually, edit the .env file:
# Database credentials
DB_PASSWORD=your_secure_32_character_password_here
# Grafana credentials
GRAFANA_ADMIN_PASSWORD=your_secure_grafana_password
GRAFANA_SECRET_KEY=your_32_character_secret_key_hereSecurity Best Practices:
- Use the automated scripts for maximum security
- Store generated passwords in a secure password manager
- Never commit
.envfiles to version control - Regenerate passwords periodically in production
- Use different passwords for each environment (dev/staging/prod)
DB_HOST=postgres # Database hostname
DB_PORT=5432 # Database port
DB_USER=postgres # Database username
DB_PASSWORD=secure_password # Database password
DB_NAME=information_broker # Database nameAPP_PORT=8080 # API server port
RSS_FETCH_INTERVAL=5m # Feed polling interval
RSS_FEEDS_FILE=/app/feeds.txt # RSS feeds configuration file
LOG_LEVEL=info # Logging level (debug/info/warn/error)
APP_INITIATION_DATE=2020-01-01 # Articles published before this date will be ignored
# Format: YYYY-MM-DD, YYYY-MM-DDTHH:MM:SSZ, or YYYY-MM-DD HH:MM:SS
ARTICLE_CUTOFF_DATE=2025-05-31T00:00:00Z # Only articles published on/after this date are processed
# Timezone-agnostic UTC comparisonOLLAMA_URL=http://ollama:11434 # Ollama server endpoint
OLLAMA_MODEL=llama3 # Model for summarization
OLLAMA_TIMEOUT=60s # Request timeout
OLLAMA_MAX_RETRIES=3 # Maximum retry attempts# Option 1: Single webhook URL (backward compatibility)
DISCORD_WEBHOOK_URL= # Single Discord webhook URL (optional)
# Option 2: Multiple webhook URLs for multi-cast notifications
DISCORD_WEBHOOK_URLS= # Comma-separated webhook URLs (optional)
# Example: https://discord.com/api/webhooks/1/token1,https://discord.com/api/webhooks/2/token2
DISCORD_MAX_RETRIES=2 # Discord publish retry attempts
DISCORD_TIMEOUT=30s # Discord request timeoutMAX_CONCURRENT_FEEDS=10 # Concurrent feed processing limit
MAX_ARTICLE_CONTENT_LENGTH=10000 # Content length limit (characters)
MAX_SUMMARY_LENGTH=200 # Summary length limit (characters)
HTTP_READ_TIMEOUT=15s # HTTP client read timeout
HTTP_WRITE_TIMEOUT=15s # HTTP client write timeoutPROMETHEUS_PORT=9090 # Prometheus server port
PROMETHEUS_RETENTION=90d # Metrics retention period
GRAFANA_PORT=3001 # Grafana dashboard port
GRAFANA_ADMIN_PASSWORD=admin # Grafana admin passwordEdit the feeds.txt file to add or remove RSS feeds:
# Edit the feeds file
nano feeds.txt
# Add new feed URLs (one per line)
https://your-new-feed.com/rss
https://another-feed.com/feed.xml
# Restart to apply changes
docker compose restart rss-monitorThe system includes 47 pre-configured cybersecurity and technology feeds covering major news sources, security publications, and threat intelligence feeds.
The system includes intelligent article filtering with two complementary mechanisms to control which articles are processed:
The primary filtering mechanism uses a strict UTC-based cutoff date to ensure only articles published on or after a specific date and time are processed:
# Set the article cutoff date in .env file
ARTICLE_CUTOFF_DATE=2025-05-31T00:00:00Z # Only articles from this date/time forward
# Supported date formats:
ARTICLE_CUTOFF_DATE=2025-05-31 # Date only (assumes 00:00:00Z)
ARTICLE_CUTOFF_DATE=2025-05-31T00:00:00Z # Full ISO 8601 UTC format (recommended)
ARTICLE_CUTOFF_DATE=2025-05-31T10:30:00+05:00 # With timezone (converted to UTC)Key Features:
- Thread-Safe: Efficient date parsing with normalized UTC comparison
- Timezone-Agnostic: All dates are converted to UTC for consistent comparison
- RSS Format Compatible: Supports all common RSS date formats automatically
- No Missing Dates: Articles without publish dates are skipped (safer default)
- Prometheus Metrics: Tracks filtered vs processed articles separately
# Set the initiation date in .env file (for backward compatibility)
APP_INITIATION_DATE=2024-01-01 # Articles before this date will be ignored
# Supported date formats:
APP_INITIATION_DATE=2024-01-01 # Date only
APP_INITIATION_DATE=2024-01-01T10:30:00Z # ISO 8601 with timezone
APP_INITIATION_DATE=2024-01-01 10:30:00 # Date and time- Date Validation: Articles without publish dates are immediately rejected
- Cutoff Date Check: Articles before
ARTICLE_CUTOFF_DATEare filtered (tracked asarticles_filtered_pre_cutoff_total) - Initiation Date Check: Articles before
APP_INITIATION_DATEare filtered (tracked asskipped_before_initiation) - Processing: Only articles passing both filters are processed (tracked as
articles_processed_post_cutoff_total) - Chronological Processing: Approved articles are sorted by publication date and processed in chronological order (oldest first)
- No Database Storage: Articles filtered by cutoff date are never inserted into the database
- No Processing: Filtered articles bypass all summarization and Discord notification steps
- Clean Logs: Filtered articles are logged but don't clutter processing metrics
- Resource Efficient: Filtering happens early in the pipeline to minimize resource usage
# New metrics for cutoff date filtering
articles_filtered_pre_cutoff_total{feed_url} # Articles filtered before cutoff date
articles_processed_post_cutoff_total{feed_url} # Articles that passed cutoff filter
# Example queries for Grafana
rate(articles_filtered_pre_cutoff_total[5m]) # Rate of filtered articles
rate(articles_processed_post_cutoff_total[5m]) # Rate of processed articlesProduction Deployment (Recommended)
# Process only articles from May 31, 2025 onward
ARTICLE_CUTOFF_DATE=2025-05-31T00:00:00Z
APP_INITIATION_DATE=2020-01-01 # Keep as fallbackTesting Environment
# Process only today's articles for testing
ARTICLE_CUTOFF_DATE=2025-05-31T00:00:00Z
APP_INITIATION_DATE=2025-05-31Migration from Old System
# Resume processing from last known good date
ARTICLE_CUTOFF_DATE=2025-05-30T15:30:00Z # Exact time when old system stopped
APP_INITIATION_DATE=2025-01-01 # Broader fallbackDevelopment with Timezone Handling
# Handle articles from different timezones correctly
ARTICLE_CUTOFF_DATE=2025-05-31T00:00:00Z # 12AM UTC = 8PM EST previous day
# This ensures consistent filtering regardless of article's original timezoneThe system automatically normalizes all dates to UTC for comparison:
# Article published at 8PM EST on May 30, 2025
# Converts to: 2025-05-31T01:00:00Z (passes cutoff)
# Article published at 6PM PST on May 30, 2025
# Converts to: 2025-05-31T02:00:00Z (passes cutoff)
# Article published at 11PM UTC on May 30, 2025
# Stays as: 2025-05-30T23:00:00Z (filtered out)# Test the filtering logic
go test -v -run TestArticleDateFiltering
# Test timezone handling specifically
go test -v -run TestTimezoneHandling
# Run demonstration script
go run test_cutoff_date.go# System health check
curl http://localhost:8080/health
# Processing statistics
curl http://localhost:8080/stats
# Feed status
curl http://localhost:8080/feeds
# Summarization queue status
curl http://localhost:8080/summarization/stats# Check overall system status
make status
# Test all API endpoints
make api-test
# View application health
make health- Access Grafana: Navigate to http://localhost:3001
- Login: Use
admin/admin(or your configured password) - View Dashboards: Pre-configured dashboards are available in the "Information Broker" folder
# Graceful shutdown with data preservation
docker compose down
# Or using Makefile
make docker-clean # Includes volume cleanupThe application handles SIGTERM and SIGINT signals gracefully, ensuring:
- In-flight summarization requests complete
- Database connections close properly
- Temporary files are cleaned up
- Queue state is preserved
The system provides comprehensive real-time and historical visibility into content volume through:
- Total Articles in Database: Real-time count of all articles stored in the database
- Daily Article Ingestion Rate: Number of articles successfully processed each day over the past 30 days
- Article Processing Success/Failure: Breakdown of processing status for troubleshooting
- Total Articles in Database - Single stat panel showing current article count with configurable thresholds
- Daily Article Ingestion Rate - Bar chart displaying daily processing trends for content volume analysis
- Real-time Processing Rate - Time series showing articles processed per minute/hour
- Low Daily Article Count: Fires when daily article processing falls below 5 articles (configurable threshold)
- No Articles Processed: Critical alert when no articles are processed for 24 hours
- Database Growth Stalled: Alert when no new articles are added for 2 days
The system includes five pre-configured dashboards:
-
Information Broker Overview (
information-broker-dashboard.json)- System health indicators
- Processing rates and queue status
- Error rate monitoring
- NEW: Total Articles in Database panel
- NEW: Daily Article Ingestion Rate chart
-
Feed Processing Rates (
feed-processing-rates-dashboard.json)- Per-feed processing statistics
- Success/failure rates
- Processing time analysis
-
Pipeline Latency (
pipeline-latency-dashboard.json)- End-to-end processing times
- Queue wait times
- API response latencies
-
Summarization Success/Failure (
summarization-success-failure-dashboard.json)- AI summarization metrics
- Model performance tracking
- Error categorization
-
Discord Latency & Errors (
discord-latency-errors-dashboard.json)- Webhook delivery performance
- Discord API error tracking
- Message formatting metrics
rss_feeds_processed_total: Total feeds processed with status labelsrss_articles_found_total: Articles discovered per feedrss_new_articles_total: New articles added to databaserss_fetch_duration_seconds: Feed fetching latency
articles_processed_total: Counter incremented each time an article is processed and written to the databasearticles_in_database: Gauge reflecting the current total article count in the database (updated every 5 minutes)
articles_filtered_pre_cutoff_total: Articles filtered due to publication before cutoff datearticles_processed_post_cutoff_total: Articles that passed cutoff date filter and were processed
summarization_requests_total: Summarization requests by statussummarization_queue_depth: Current queue sizesummarization_duration_seconds: Time to generate summariesollama_api_requests_total: Ollama API call statistics
discord_webhook_requests_total: Webhook delivery attemptsdiscord_publish_duration_seconds: Discord API latencydiscord_errors_total: Publishing error counts
database_connections_active: PostgreSQL connection pool statushttp_requests_total: API endpoint usageapplication_uptime_seconds: Service availability
# Current total articles in database
articles_in_database
# Daily article ingestion rate (last 30 days)
increase(articles_processed_total{status="success"}[1d])
# Article processing success rate
rate(articles_processed_total{status="success"}[5m]) / rate(articles_processed_total[5m]) * 100
# Weekly content volume trend
increase(articles_processed_total{status="success"}[7d])
# Daily article processing below threshold
increase(articles_processed_total{status="success"}[1d]) < 5
# No articles processed in 24 hours
increase(articles_processed_total{status="success"}[1d]) == 0
# Database growth stalled (no new articles in 2 days)
increase(articles_processed_total{status="success"}[2d]) == 0
-
Application Health:
GET /health{ "status": "healthy", "timestamp": "2025-01-15T10:30:00Z", "database": "connected", "ollama": "available", "queue_depth": 5 } -
Prometheus Metrics:
GET /metrics- Standard Prometheus exposition format
- All custom application metrics
- Go runtime metrics
The system includes comprehensive testing through the Makefile:
# Run all API endpoint tests
make api-test
# Performance testing
make perf-test
# Load testing (requires 'hey' tool)
make load-test
# Health verification
make health# Clear database for fresh start
make clear-db
# Force clear without confirmation
make clear-db-force
# Reset database and restart application
make reset-db
# Backup database
make backup-db# Start development environment
make setup-dev
# Build and run locally
make dev
# Run with development database
make dev-db
make run
make dev-db-stopSymptoms: Summarization requests failing, queue building up
# Check Ollama connectivity
curl http://your-ollama-server:11434/api/tags
# Verify model availability
curl http://your-ollama-server:11434/api/show -d '{"name": "llama3"}'
# Check logs for specific errors
make logs | grep -i ollamaSolutions:
- Verify
OLLAMA_URLin.envfile - Ensure Ollama server is running and accessible
- Check if the specified model is pulled:
ollama pull llama3 - Increase
OLLAMA_TIMEOUTfor slower responses
Symptoms: Application failing to start, health checks failing
# Check database status
docker compose exec postgres pg_isready
# View database logs
docker compose logs postgres
# Test connection manually
docker compose exec postgres psql -U postgres -d information_broker -c "\dt"Solutions:
- Verify database credentials in
.env - Ensure PostgreSQL container is healthy
- Check for port conflicts on 5432
- Restart database:
docker compose restart postgres
Symptoms: Growing queue depth, delayed processing
# Check queue status
curl http://localhost:8080/summarization/stats
# Monitor queue metrics in Grafana
# Check summarization processing timesSolutions:
- Reduce
RSS_FETCH_INTERVALto slow article intake - Increase
OLLAMA_TIMEOUTif requests are timing out - Check Ollama server performance and scaling
- Consider reducing
MAX_CONCURRENT_FEEDS
Symptoms: Summaries not appearing in Discord, webhook errors
# Test webhook manually
curl -X POST "YOUR_DISCORD_WEBHOOK_URL" \
-H "Content-Type: application/json" \
-d '{"content": "Test message"}'
# Check Discord error metrics
curl http://localhost:8080/metrics | grep discordSolutions:
- Verify
DISCORD_WEBHOOK_URLis correct and active - Check webhook permissions in Discord server
- Increase
DISCORD_TIMEOUTfor slow connections - Review Discord rate limiting documentation
Symptoms: Out of memory errors, container restarts
# Monitor resource usage
docker stats information-broker-app
# Check for memory leaks in metrics
curl http://localhost:8080/metrics | grep go_memstatsSolutions:
- Reduce
MAX_ARTICLE_CONTENT_LENGTH - Lower
MAX_CONCURRENT_FEEDS - Increase Docker container memory limits
- Review feed content for unusually large articles
# View real-time logs
make logs
# Filter for specific components
docker compose logs rss-monitor | grep "ERROR"
docker compose logs rss-monitor | grep "summarization"
# Export logs for analysis
docker compose logs --since 24h rss-monitor > debug.log# Monitor performance metrics
make api-test
curl http://localhost:8080/metrics | grep duration
# Database performance
docker compose exec postgres psql -U postgres -d information_broker -c "
SELECT schemaname,tablename,attname,avg_width,n_distinct,correlation
FROM pg_stats
WHERE schemaname='public';"- Update
OLLAMA_MODELin.env - Modify summarization prompts in
summarizer.go - Test with new model before production deployment
- Add URLs to
feeds.txt - Restart application:
docker compose restart rss-monitor - Monitor feed health in Grafana dashboards
- Add new metrics in
metrics.go - Instrument code with metric updates
- Create custom Grafana dashboard panels
Multi-Discord Webhook Support: The system now supports sending notifications to multiple Discord webhooks simultaneously:
# Configure multiple Discord webhooks for multi-cast notifications
DISCORD_WEBHOOK_URLS=https://discord.com/api/webhooks/123/token1,https://discord.com/api/webhooks/456/token2
# Or use single webhook for backward compatibility
DISCORD_WEBHOOK_URL=https://discord.com/api/webhooks/123/token1Features:
- Concurrent Delivery: All webhooks receive notifications simultaneously
- Individual Error Handling: Failed webhooks don't affect others
- Backward Compatibility: Single webhook configuration still works
- Automatic Fallback: Single webhook URL automatically converts to multi-webhook format
Extending to Other Platforms:
- Extend webhook functionality in
discord_webhook.go - Add new webhook targets (Slack, Teams, etc.)
- Configure routing based on content type or source
The current single-worker design can be extended to multi-worker:
- Modify Scheduler: Update
summarization_scheduler.goto support worker pools - Add Worker Management: Implement worker lifecycle and load balancing
- Update Metrics: Track per-worker performance
- Configuration: Add
SUMMARIZATION_WORKERSenvironment variable
Trade-offs: Higher throughput vs. increased API load and complexity
For high-volume deployments:
- Load Balancer: Add reverse proxy for multiple application instances
- Shared Queue: Implement Redis or RabbitMQ for distributed queue
- Database Clustering: Configure PostgreSQL replication
- Feed Distribution: Partition feeds across instances
# Increase concurrent feed processing
MAX_CONCURRENT_FEEDS=20
# Optimize batch processing
RSS_FETCH_INTERVAL=2m
# Tune database connections
# Add to docker-compose.yml postgres service
command: -c max_connections=200 -c shared_buffers=256MB- Main Application:
main.go- Service orchestration and lifecycle - RSS Processing:
monitor.go- Feed scraping and article extraction - AI Integration:
summarizer.go- Ollama API communication - Queue Management:
summarization_scheduler.go- Thread-safe job scheduling - Metrics:
metrics.go- Prometheus instrumentation - Configuration:
config/config.go- Environment management
# Unit tests for core components
go test ./...
# Integration tests with test database
make dev-db
go test -tags=integration ./...
make dev-db-stop
# Performance benchmarks
go test -bench=. -benchmem ./...- Fork the repository
- Create feature branch:
git checkout -b feature/new-feature - Add tests for new functionality
- Update documentation and README
- Submit pull request with detailed description
This project is licensed under the MIT License. See LICENSE file for details.
- RSS Processing: Built with Go's standard library and
gofeedparser - AI Integration: Powered by Ollama local LLM infrastructure
- Monitoring: Prometheus and Grafana open-source observability stack
- Database: PostgreSQL for reliable data persistence
- Containerization: Docker and Docker Compose for consistent deployments
github.com/lib/pq- PostgreSQL drivergithub.com/prometheus/client_golang- Prometheus metricsgithub.com/mmcdole/gofeed- RSS/Atom feed parsing- Various Go standard library packages
- Use the provided password generation scripts for secure credentials
- Change all default passwords in production using
./scripts/generate-passwords.sh - Use HTTPS with reverse proxy (nginx/Traefik)
- Implement network segmentation
- Regular security updates for all components
- Monitor access logs and implement rate limiting
- Secure Discord webhook URLs and rotate periodically
- Regenerate passwords periodically with
./scripts/regenerate-passwords.sh - Never commit
.envfiles containing real credentials to version control
Information Broker - Intelligent RSS Processing with AI-Powered Summarization