PR-Sentinel

An automated framework that detects, scores, and mitigates spammy pull requests on GitHub repositories in real time.

Features

🎯 Automated Spam Detection: Analyzes PR metadata, file diffs, and text for spam indicators
🔍 Multi-Factor Analysis:
- Trivial README edits
- Minimal code changes
- Generic AI-generated text patterns
- Suspicious patterns in descriptions
🤖 Auto-Moderation: Automatically comments on and closes spam PRs
💾 Lightweight Storage: JSON-based tracking without database requirements
⚡ FastAPI Webhook: High-performance webhook listener for GitHub events
📊 Configurable Thresholds: Customizable spam detection scoring

Architecture

PR-Sentinel consists of several key components:

FastAPI Webhook Listener (main.py): Receives GitHub pull_request events
Spam Detector (spam_detector.py): Analyzes PRs using multiple heuristics
GitHub Client (github_client.py): Interacts with GitHub API using PyGithub
Storage (storage.py): Lightweight JSON-based PR tracking
Configuration (config.py): Centralized settings management

Installation

Prerequisites

Python 3.8 or higher
GitHub Personal Access Token with repo permissions
(Optional) GitHub Webhook Secret for secure webhooks

Setup

Clone the repository:

git clone https://github.com/Anorak001/PR-Sentinel.git
cd PR-Sentinel

Install dependencies:

pip install -r requirements.txt

Configure environment variables:

cp .env.example .env
# Edit .env with your GitHub token and settings

Required environment variables:

GITHUB_TOKEN: Your GitHub Personal Access Token
GITHUB_WEBHOOK_SECRET: (Optional) Secret for webhook verification
SPAM_SCORE_THRESHOLD: Score threshold for spam detection (default: 70)

Usage

Running the Server

Start the FastAPI webhook listener:

python main.py

Or using uvicorn directly:

uvicorn main:app --host 0.0.0.0 --port 8000

The server will start on http://0.0.0.0:8000

Setting Up GitHub Webhook

Go to your repository settings → Webhooks → Add webhook
Set Payload URL to: http://your-server:8000/webhook
Set Content type to: application/json
(Optional) Set Secret to match your GITHUB_WEBHOOK_SECRET
Select "Let me select individual events" and choose:
- ✅ Pull requests
Click "Add webhook"

API Endpoints

GET / - Service information
GET /health - Health check
GET /stats - Statistics about tracked PRs
POST /webhook - GitHub webhook endpoint

Spam Detection Logic

PR-Sentinel uses a weighted scoring system to detect spam:

Detection Criteria

Trivial README Edits (Weight: 30)
- Only README files modified
- Less than 10-50 lines changed
Minimal Code Changes (Weight: 25)
- Very small number of line changes (≤5 lines: full score)
- Small changes (≤15 lines: partial score)
Generic AI Text (Weight: 35)
- Detects common AI-generated text patterns
- Phrases like "as an AI", "it's worth noting", "delve into", etc.
Suspicious Patterns (Weight: 10)
- "typo fix", "minor fix" with minimal description
- Very short PR descriptions

Scoring

Scores range from 0-100
Default threshold: 70 (configurable)
Scores above threshold trigger auto-moderation

Auto-Moderation Actions

When a PR exceeds the spam threshold:

Comment: Posts an automated comment explaining the detection
Close: Automatically closes the PR
Track: Stores the PR data for future reference

Configuration

Edit config.py or use environment variables to customize:

# Spam Detection Thresholds
SPAM_SCORE_THRESHOLD = 70  # Adjust sensitivity

# Detection Weights
WEIGHTS = {
    "trivial_readme": 30,
    "minimal_changes": 25,
    "generic_ai_text": 35,
    "suspicious_patterns": 10
}

# Storage
MAX_TRACKED_PRS = 100  # Number of PRs to keep in memory

Storage

PR-Sentinel uses a simple JSON file (pr_tracking.json) to store recent PR data:

{
  "prs": [
    {
      "repo": "owner/repo",
      "pr_number": 123,
      "user": "username",
      "spam_score": 75.0,
      "is_spam": true,
      "details": {...},
      "tracked_at": "2025-01-01T00:00:00"
    }
  ]
}

The storage automatically:

Keeps only the most recent PRs (default: 100)
Tracks spam scores and detection reasons
Records timestamps for all actions

Development

Project Structure

PR-Sentinel/
├── main.py              # FastAPI webhook listener
├── spam_detector.py     # Spam detection logic
├── github_client.py     # GitHub API client
├── storage.py           # JSON storage
├── config.py            # Configuration
├── requirements.txt     # Dependencies
├── .env.example         # Environment template
├── .gitignore          # Git ignore rules
└── README.md           # Documentation

Running Tests

The system is designed to be lightweight and doesn't require extensive testing infrastructure. Manual testing can be done by:

Starting the server
Triggering test webhooks from GitHub
Checking the /stats endpoint for results

Security Considerations

Webhook Verification: Always use GITHUB_WEBHOOK_SECRET in production
Token Security: Never commit your GITHUB_TOKEN to version control
Rate Limiting: GitHub API has rate limits; the system handles them gracefully
False Positives: Monitor the /stats endpoint and adjust thresholds as needed

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

License

MIT License - see LICENSE file for details.

Support

For issues, questions, or contributions, please open an issue on GitHub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR-Sentinel

Features

Architecture

Installation

Prerequisites

Setup

Usage

Running the Server

Setting Up GitHub Webhook

API Endpoints

Spam Detection Logic

Detection Criteria

Scoring

Auto-Moderation Actions

Configuration

Storage

Development

Project Structure

Running Tests

Security Considerations

Contributing

License

Support

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

PR-Sentinel

Features

Architecture

Installation

Prerequisites

Setup

Usage

Running the Server

Setting Up GitHub Webhook

API Endpoints

Spam Detection Logic

Detection Criteria

Scoring

Auto-Moderation Actions

Configuration

Storage

Development

Project Structure

Running Tests

Security Considerations

Contributing

License

Support