Save your inbox from Google Scholar alert overload.
Scholar Alerts Assistant automatically processes your Google Scholar alert emails, filters duplicates, highlights priority papers based on your keywords, and sends you a beautifully formatted digest email. No more cluttered inbox - just the papers that matter to you.
- High Performance: Batch Gmail API processing with 95% fewer API calls
- Smart Filtering: Keyword-based prioritization and blacklist filtering
- HTML Digest: Beautiful formatted emails with highlighted keywords
- Duplicate Detection: Persistent database prevents repeat notifications
- Fast Processing: Optimized algorithms for large email volumes
- Safe Processing: Mark as read or delete processed emails
- Automation Ready: Perfect for cron jobs and scheduled execution
-
Gmail Labels: Set up an automatic filter to label your Google Scholar alert emails (Gmail Filter Guide)
-
Gmail API Credentials:
- Enable the Gmail API
- Download
credentials.jsonto the./data/folder
# Clone the repository
git clone https://github.com/daoxusheng/scholar-alerts-assistant.git
cd scholar-alerts-assistant
# Install dependencies
pip install google-auth google-auth-oauthlib google-auth-httplib2 google-api-python-client
# Create data directory structure
mkdir -p data
# Place your credentials.json in the data folderEdit the configuration variables in main.py:
# Gmail labels to filter for scholar alerts
SCHOLAR_LABEL = [
"UNREAD",
"Academia", # Replace with your actual label
]
# Email address to receive digest
DIGEST_ADDRESS = "your-email@gmail.com" # Replace with your email
# Processing mode
DELETE_MODE = False # True to delete, False to mark as readpython main.pyOn first run, your browser will open for Gmail API authorization. This creates a token.json file for future runs.
scholar-alerts-assistant/
βββ main.py # Main script and configuration
βββ utils/
β βββ assistant.py # Core processing logic
β βββ gmail.py # Gmail API integration
β βββ formatter.py # HTML email formatting
β βββ parser.py # Email content parsing
βββ data/
β βββ credentials.json # Gmail API credentials (you provide)
β βββ token.json # Auto-generated OAuth token
β βββ db.json # Paper database (auto-generated)
β βββ keywords.json # Priority keywords (optional)
β βββ blacklist.json # Filtered terms (optional)
βββ README.md
Create data/keywords.json to prioritize papers containing specific terms:
[
"machine learning",
"neural networks",
"deep learning",
"artificial intelligence",
"computer vision"
]Papers matching these keywords will appear in the "Highly-related Papers" section of your digest.
Create data/blacklist.json to filter out unwanted papers:
[
"retracted",
"withdrawn",
"spam term",
"unwanted topic"
]Papers containing blacklisted terms will be automatically excluded from your digest.
| Variable | Description | Default |
|---|---|---|
VERBOSE |
Logging level (0=silent, 1=info) | 1 |
DELETE_MODE |
Delete processed emails vs mark as read | False |
USER_ID |
Gmail user ID | "me" |
SCHOLAR_LABEL |
Gmail labels to filter | ["UNREAD", "Academia"] |
DIGEST_ADDRESS |
Destination email for digest | Required |
DATA_PATH |
Directory for data files | "./data/" |
BATCH_SIZE |
Gmail API batch processing size | 10 |
- Configurable Batch Processing: Adjustable batch sizes (default: 10 messages per batch)
- Rate Limiting Protection: Exponential backoff with automatic retry on API limits
- Smart Caching: Compiled regex patterns and preloaded data
- Optimized Filtering: Single-pass algorithms for large datasets
- Early Termination: Skips processing when no emails found
Run daily at 9 AM:
# Edit crontab
crontab -e
# Add this line
0 9 * * * cd /path/to/scholar-alerts-assistant && python main.pyCreate /etc/systemd/system/scholar-alerts.service:
[Unit]
Description=Scholar Alerts Processor
[Service]
Type=oneshot
User=yourusername
WorkingDirectory=/path/to/scholar-alerts-assistant
ExecStart=/usr/bin/python main.pyCreate /etc/systemd/system/scholar-alerts.timer:
[Unit]
Description=Run Scholar Alerts daily
[Timer]
OnCalendar=daily
Persistent=true
[Install]
WantedBy=timers.targetEnable and start:
sudo systemctl enable scholar-alerts.timer
sudo systemctl start scholar-alerts.timer"No module named 'google'"
pip install google-auth google-auth-oauthlib google-auth-httplib2 google-api-python-client"credentials.json not found"
- Ensure
credentials.jsonis in the./data/folder - Verify Gmail API is enabled in Google Cloud Console
"No emails found"
- Check your Gmail label name matches
SCHOLAR_LABEL - Verify Scholar alerts are being labeled correctly
- Run with
VERBOSE = 1for debugging info
"Permission denied" errors
- Delete
data/token.jsonand re-authenticate - Check Gmail API scopes in Google Cloud Console
Gmail API rate limit errors (429)
- Reduce
BATCH_SIZEto 5 or lower inmain.py - The tool automatically retries with exponential backoff
- For persistent issues, try
BATCH_SIZE = 1
Large email processing slow
- Increase
BATCH_SIZEto 20-50 if no rate limit issues - Default batch size (10) balances speed and reliability
- For 100+ emails, expect 30-60 seconds processing time
Enable verbose logging by setting:
VERBOSE = 1This shows detailed processing information including:
- Number of emails fetched
- Duplicate detection results
- Blacklist filtering stats
- Keyword prioritization counts
[INFO] 45 Emails with {['UNREAD', 'Academia']} labels found.
[INFO] 38 in 45 unique entries found.
[INFO] Removed 3 entries by blacklist: {'retracted', 'spam'}
[INFO] 12 entries prioritized.
[INFO] Digest Message formatted.
[INFO] Digest Message sent.
[INFO] 45 messages marked as read.
[LOG] Updated 12 paper entries.
- All processing happens locally on your machine
- Only Gmail API access required (read emails, send digest)
- No data sent to external services
- Credentials stored securely using OAuth2 standards
Contributions welcome! Please feel free to submit pull requests or open issues for:
- Performance improvements
- New features
- Bug fixes
- Documentation enhancements
This project is licensed under the MIT License - see the LICENSE file for details.