An automated framework that detects, scores, and mitigates spammy pull requests on GitHub repositories in real time.
- 🎯 Automated Spam Detection: Analyzes PR metadata, file diffs, and text for spam indicators
- 🔍 Multi-Factor Analysis:
- Trivial README edits
- Minimal code changes
- Generic AI-generated text patterns
- Suspicious patterns in descriptions
- 🤖 Auto-Moderation: Automatically comments on and closes spam PRs
- 💾 Lightweight Storage: JSON-based tracking without database requirements
- ⚡ FastAPI Webhook: High-performance webhook listener for GitHub events
- 📊 Configurable Thresholds: Customizable spam detection scoring
PR-Sentinel consists of several key components:
- FastAPI Webhook Listener (
main.py): Receives GitHub pull_request events - Spam Detector (
spam_detector.py): Analyzes PRs using multiple heuristics - GitHub Client (
github_client.py): Interacts with GitHub API using PyGithub - Storage (
storage.py): Lightweight JSON-based PR tracking - Configuration (
config.py): Centralized settings management
- Python 3.8 or higher
- GitHub Personal Access Token with repo permissions
- (Optional) GitHub Webhook Secret for secure webhooks
- Clone the repository:
git clone https://github.com/Anorak001/PR-Sentinel.git
cd PR-Sentinel- Install dependencies:
pip install -r requirements.txt- Configure environment variables:
cp .env.example .env
# Edit .env with your GitHub token and settingsRequired environment variables:
GITHUB_TOKEN: Your GitHub Personal Access TokenGITHUB_WEBHOOK_SECRET: (Optional) Secret for webhook verificationSPAM_SCORE_THRESHOLD: Score threshold for spam detection (default: 70)
Start the FastAPI webhook listener:
python main.pyOr using uvicorn directly:
uvicorn main:app --host 0.0.0.0 --port 8000The server will start on http://0.0.0.0:8000
- Go to your repository settings → Webhooks → Add webhook
- Set Payload URL to:
http://your-server:8000/webhook - Set Content type to:
application/json - (Optional) Set Secret to match your
GITHUB_WEBHOOK_SECRET - Select "Let me select individual events" and choose:
- ✅ Pull requests
- Click "Add webhook"
GET /- Service informationGET /health- Health checkGET /stats- Statistics about tracked PRsPOST /webhook- GitHub webhook endpoint
PR-Sentinel uses a weighted scoring system to detect spam:
-
Trivial README Edits (Weight: 30)
- Only README files modified
- Less than 10-50 lines changed
-
Minimal Code Changes (Weight: 25)
- Very small number of line changes (≤5 lines: full score)
- Small changes (≤15 lines: partial score)
-
Generic AI Text (Weight: 35)
- Detects common AI-generated text patterns
- Phrases like "as an AI", "it's worth noting", "delve into", etc.
-
Suspicious Patterns (Weight: 10)
- "typo fix", "minor fix" with minimal description
- Very short PR descriptions
- Scores range from 0-100
- Default threshold: 70 (configurable)
- Scores above threshold trigger auto-moderation
When a PR exceeds the spam threshold:
- Comment: Posts an automated comment explaining the detection
- Close: Automatically closes the PR
- Track: Stores the PR data for future reference
Edit config.py or use environment variables to customize:
# Spam Detection Thresholds
SPAM_SCORE_THRESHOLD = 70 # Adjust sensitivity
# Detection Weights
WEIGHTS = {
"trivial_readme": 30,
"minimal_changes": 25,
"generic_ai_text": 35,
"suspicious_patterns": 10
}
# Storage
MAX_TRACKED_PRS = 100 # Number of PRs to keep in memoryPR-Sentinel uses a simple JSON file (pr_tracking.json) to store recent PR data:
{
"prs": [
{
"repo": "owner/repo",
"pr_number": 123,
"user": "username",
"spam_score": 75.0,
"is_spam": true,
"details": {...},
"tracked_at": "2025-01-01T00:00:00"
}
]
}The storage automatically:
- Keeps only the most recent PRs (default: 100)
- Tracks spam scores and detection reasons
- Records timestamps for all actions
PR-Sentinel/
├── main.py # FastAPI webhook listener
├── spam_detector.py # Spam detection logic
├── github_client.py # GitHub API client
├── storage.py # JSON storage
├── config.py # Configuration
├── requirements.txt # Dependencies
├── .env.example # Environment template
├── .gitignore # Git ignore rules
└── README.md # Documentation
The system is designed to be lightweight and doesn't require extensive testing infrastructure. Manual testing can be done by:
- Starting the server
- Triggering test webhooks from GitHub
- Checking the
/statsendpoint for results
- Webhook Verification: Always use
GITHUB_WEBHOOK_SECRETin production - Token Security: Never commit your
GITHUB_TOKENto version control - Rate Limiting: GitHub API has rate limits; the system handles them gracefully
- False Positives: Monitor the
/statsendpoint and adjust thresholds as needed
Contributions are welcome! Please feel free to submit issues or pull requests.
MIT License - see LICENSE file for details.
For issues, questions, or contributions, please open an issue on GitHub.