Skip to content

Anorak001/PR-Sentinel

Repository files navigation

PR-Sentinel

An automated framework that detects, scores, and mitigates spammy pull requests on GitHub repositories in real time.

Features

  • 🎯 Automated Spam Detection: Analyzes PR metadata, file diffs, and text for spam indicators
  • 🔍 Multi-Factor Analysis:
    • Trivial README edits
    • Minimal code changes
    • Generic AI-generated text patterns
    • Suspicious patterns in descriptions
  • 🤖 Auto-Moderation: Automatically comments on and closes spam PRs
  • 💾 Lightweight Storage: JSON-based tracking without database requirements
  • FastAPI Webhook: High-performance webhook listener for GitHub events
  • 📊 Configurable Thresholds: Customizable spam detection scoring

Architecture

PR-Sentinel consists of several key components:

  1. FastAPI Webhook Listener (main.py): Receives GitHub pull_request events
  2. Spam Detector (spam_detector.py): Analyzes PRs using multiple heuristics
  3. GitHub Client (github_client.py): Interacts with GitHub API using PyGithub
  4. Storage (storage.py): Lightweight JSON-based PR tracking
  5. Configuration (config.py): Centralized settings management

Installation

Prerequisites

  • Python 3.8 or higher
  • GitHub Personal Access Token with repo permissions
  • (Optional) GitHub Webhook Secret for secure webhooks

Setup

  1. Clone the repository:
git clone https://github.com/Anorak001/PR-Sentinel.git
cd PR-Sentinel
  1. Install dependencies:
pip install -r requirements.txt
  1. Configure environment variables:
cp .env.example .env
# Edit .env with your GitHub token and settings

Required environment variables:

  • GITHUB_TOKEN: Your GitHub Personal Access Token
  • GITHUB_WEBHOOK_SECRET: (Optional) Secret for webhook verification
  • SPAM_SCORE_THRESHOLD: Score threshold for spam detection (default: 70)

Usage

Running the Server

Start the FastAPI webhook listener:

python main.py

Or using uvicorn directly:

uvicorn main:app --host 0.0.0.0 --port 8000

The server will start on http://0.0.0.0:8000

Setting Up GitHub Webhook

  1. Go to your repository settings → Webhooks → Add webhook
  2. Set Payload URL to: http://your-server:8000/webhook
  3. Set Content type to: application/json
  4. (Optional) Set Secret to match your GITHUB_WEBHOOK_SECRET
  5. Select "Let me select individual events" and choose:
    • ✅ Pull requests
  6. Click "Add webhook"

API Endpoints

  • GET / - Service information
  • GET /health - Health check
  • GET /stats - Statistics about tracked PRs
  • POST /webhook - GitHub webhook endpoint

Spam Detection Logic

PR-Sentinel uses a weighted scoring system to detect spam:

Detection Criteria

  1. Trivial README Edits (Weight: 30)

    • Only README files modified
    • Less than 10-50 lines changed
  2. Minimal Code Changes (Weight: 25)

    • Very small number of line changes (≤5 lines: full score)
    • Small changes (≤15 lines: partial score)
  3. Generic AI Text (Weight: 35)

    • Detects common AI-generated text patterns
    • Phrases like "as an AI", "it's worth noting", "delve into", etc.
  4. Suspicious Patterns (Weight: 10)

    • "typo fix", "minor fix" with minimal description
    • Very short PR descriptions

Scoring

  • Scores range from 0-100
  • Default threshold: 70 (configurable)
  • Scores above threshold trigger auto-moderation

Auto-Moderation Actions

When a PR exceeds the spam threshold:

  1. Comment: Posts an automated comment explaining the detection
  2. Close: Automatically closes the PR
  3. Track: Stores the PR data for future reference

Configuration

Edit config.py or use environment variables to customize:

# Spam Detection Thresholds
SPAM_SCORE_THRESHOLD = 70  # Adjust sensitivity

# Detection Weights
WEIGHTS = {
    "trivial_readme": 30,
    "minimal_changes": 25,
    "generic_ai_text": 35,
    "suspicious_patterns": 10
}

# Storage
MAX_TRACKED_PRS = 100  # Number of PRs to keep in memory

Storage

PR-Sentinel uses a simple JSON file (pr_tracking.json) to store recent PR data:

{
  "prs": [
    {
      "repo": "owner/repo",
      "pr_number": 123,
      "user": "username",
      "spam_score": 75.0,
      "is_spam": true,
      "details": {...},
      "tracked_at": "2025-01-01T00:00:00"
    }
  ]
}

The storage automatically:

  • Keeps only the most recent PRs (default: 100)
  • Tracks spam scores and detection reasons
  • Records timestamps for all actions

Development

Project Structure

PR-Sentinel/
├── main.py              # FastAPI webhook listener
├── spam_detector.py     # Spam detection logic
├── github_client.py     # GitHub API client
├── storage.py           # JSON storage
├── config.py            # Configuration
├── requirements.txt     # Dependencies
├── .env.example         # Environment template
├── .gitignore          # Git ignore rules
└── README.md           # Documentation

Running Tests

The system is designed to be lightweight and doesn't require extensive testing infrastructure. Manual testing can be done by:

  1. Starting the server
  2. Triggering test webhooks from GitHub
  3. Checking the /stats endpoint for results

Security Considerations

  • Webhook Verification: Always use GITHUB_WEBHOOK_SECRET in production
  • Token Security: Never commit your GITHUB_TOKEN to version control
  • Rate Limiting: GitHub API has rate limits; the system handles them gracefully
  • False Positives: Monitor the /stats endpoint and adjust thresholds as needed

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

License

MIT License - see LICENSE file for details.

Support

For issues, questions, or contributions, please open an issue on GitHub.

About

An automated framework that detects, scores, and mitigates spammy pull requests on GitHub repositories in real time.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors