📚 Scholar Alerts Assistant

Save your inbox from Google Scholar alert overload.

Scholar Alerts Assistant automatically processes your Google Scholar alert emails, filters duplicates, highlights priority papers based on your keywords, and sends you a beautifully formatted digest email. No more cluttered inbox - just the papers that matter to you.

✨ Features

High Performance: Batch Gmail API processing with 95% fewer API calls
Smart Filtering: Keyword-based prioritization and blacklist filtering
HTML Digest: Beautiful formatted emails with highlighted keywords
Duplicate Detection: Persistent database prevents repeat notifications
Fast Processing: Optimized algorithms for large email volumes
Safe Processing: Mark as read or delete processed emails
Automation Ready: Perfect for cron jobs and scheduled execution

🔧 Quick Start

Prerequisites

Gmail Labels: Set up an automatic filter to label your Google Scholar alert emails (Gmail Filter Guide)
Gmail API Credentials:
- Enable the Gmail API
- Download credentials.json to the ./data/ folder

Installation

# Clone the repository
git clone https://github.com/daoxusheng/scholar-alerts-assistant.git
cd scholar-alerts-assistant

# Install dependencies
pip install google-auth google-auth-oauthlib google-auth-httplib2 google-api-python-client

# Create data directory structure
mkdir -p data
# Place your credentials.json in the data folder

Configuration

Edit the configuration variables in main.py:

# Gmail labels to filter for scholar alerts
SCHOLAR_LABEL = [
    "UNREAD",
    "Academia",  # Replace with your actual label
]

# Email address to receive digest
DIGEST_ADDRESS = "your-email@gmail.com"  # Replace with your email

# Processing mode
DELETE_MODE = False  # True to delete, False to mark as read

First Run

python main.py

On first run, your browser will open for Gmail API authorization. This creates a token.json file for future runs.

📁 Project Structure

scholar-alerts-assistant/
├── main.py                 # Main script and configuration
├── utils/
│   ├── assistant.py        # Core processing logic
│   ├── gmail.py           # Gmail API integration
│   ├── formatter.py       # HTML email formatting
│   └── parser.py          # Email content parsing
├── data/
│   ├── credentials.json   # Gmail API credentials (you provide)
│   ├── token.json         # Auto-generated OAuth token
│   ├── db.json           # Paper database (auto-generated)
│   ├── keywords.json     # Priority keywords (optional)
│   └── blacklist.json    # Filtered terms (optional)
└── README.md

⚙️ Advanced Configuration

Keyword Prioritization

Create data/keywords.json to prioritize papers containing specific terms:

[
  "machine learning",
  "neural networks",
  "deep learning",
  "artificial intelligence",
  "computer vision"
]

Papers matching these keywords will appear in the "Highly-related Papers" section of your digest.

Blacklist Filtering

Create data/blacklist.json to filter out unwanted papers:

[
  "retracted",
  "withdrawn",
  "spam term",
  "unwanted topic"
]

Papers containing blacklisted terms will be automatically excluded from your digest.

Configuration Options

Variable	Description	Default
`VERBOSE`	Logging level (0=silent, 1=info)	`1`
`DELETE_MODE`	Delete processed emails vs mark as read	`False`
`USER_ID`	Gmail user ID	`"me"`
`SCHOLAR_LABEL`	Gmail labels to filter	`["UNREAD", "Academia"]`
`DIGEST_ADDRESS`	Destination email for digest	Required
`DATA_PATH`	Directory for data files	`"./data/"`
`BATCH_SIZE`	Gmail API batch processing size	`10`

🚀 Performance Features

Configurable Batch Processing: Adjustable batch sizes (default: 10 messages per batch)
Rate Limiting Protection: Exponential backoff with automatic retry on API limits
Smart Caching: Compiled regex patterns and preloaded data
Optimized Filtering: Single-pass algorithms for large datasets
Early Termination: Skips processing when no emails found

🤖 Automation

Cron Job Setup

Run daily at 9 AM:

# Edit crontab
crontab -e

# Add this line
0 9 * * * cd /path/to/scholar-alerts-assistant && python main.py

Systemd Timer (Linux)

Create /etc/systemd/system/scholar-alerts.service:

[Unit]
Description=Scholar Alerts Processor

[Service]
Type=oneshot
User=yourusername
WorkingDirectory=/path/to/scholar-alerts-assistant
ExecStart=/usr/bin/python main.py

Create /etc/systemd/system/scholar-alerts.timer:

[Unit]
Description=Run Scholar Alerts daily

[Timer]
OnCalendar=daily
Persistent=true

[Install]
WantedBy=timers.target

Enable and start:

sudo systemctl enable scholar-alerts.timer
sudo systemctl start scholar-alerts.timer

🐛 Troubleshooting

Common Issues

"No module named 'google'"

pip install google-auth google-auth-oauthlib google-auth-httplib2 google-api-python-client

"credentials.json not found"

Ensure credentials.json is in the ./data/ folder
Verify Gmail API is enabled in Google Cloud Console

"No emails found"

Check your Gmail label name matches SCHOLAR_LABEL
Verify Scholar alerts are being labeled correctly
Run with VERBOSE = 1 for debugging info

"Permission denied" errors

Delete data/token.json and re-authenticate
Check Gmail API scopes in Google Cloud Console

Gmail API rate limit errors (429)

Reduce BATCH_SIZE to 5 or lower in main.py
The tool automatically retries with exponential backoff
For persistent issues, try BATCH_SIZE = 1

Large email processing slow

Increase BATCH_SIZE to 20-50 if no rate limit issues
Default batch size (10) balances speed and reliability
For 100+ emails, expect 30-60 seconds processing time

Debug Mode

Enable verbose logging by setting:

VERBOSE = 1

This shows detailed processing information including:

Number of emails fetched
Duplicate detection results
Blacklist filtering stats
Keyword prioritization counts

📊 Example Output

[INFO] 45 Emails with {['UNREAD', 'Academia']} labels found.
[INFO] 38 in 45 unique entries found.
[INFO] Removed 3 entries by blacklist: {'retracted', 'spam'}
[INFO] 12 entries prioritized.
[INFO] Digest Message formatted.
[INFO] Digest Message sent.
[INFO] 45 messages marked as read.
[LOG] Updated 12 paper entries.

🔒 Privacy & Security

All processing happens locally on your machine
Only Gmail API access required (read emails, send digest)
No data sent to external services
Credentials stored securely using OAuth2 standards

🤝 Contributing

Contributions welcome! Please feel free to submit pull requests or open issues for:

Performance improvements
New features
Bug fixes
Documentation enhancements

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📚 Scholar Alerts Assistant

✨ Features

🔧 Quick Start

Prerequisites

Installation

Configuration

First Run

📁 Project Structure

⚙️ Advanced Configuration

Keyword Prioritization

Blacklist Filtering

Configuration Options

🚀 Performance Features

🤖 Automation

Cron Job Setup

Systemd Timer (Linux)

🐛 Troubleshooting

Common Issues

Debug Mode

📊 Example Output

🔒 Privacy & Security

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Folders and files

Latest commit

History

Repository files navigation

📚 Scholar Alerts Assistant

✨ Features

🔧 Quick Start

Prerequisites

Installation

Configuration

First Run

📁 Project Structure

⚙️ Advanced Configuration

Keyword Prioritization

Blacklist Filtering

Configuration Options

🚀 Performance Features

🤖 Automation

Cron Job Setup

Systemd Timer (Linux)

🐛 Troubleshooting

Common Issues

Debug Mode

📊 Example Output

🔒 Privacy & Security

🤝 Contributing

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages