Skip to content

ladebw/hackathon-hunter

Repository files navigation

Hackathon Hunter Agent

A production‑ready AI agent that continuously discovers, extracts, structures, analyzes, and reports hackathons and competitions from multiple platforms.

🎯 Project Overview

Hackathon Hunter scans popular hackathon platforms, extracts detailed information from each listing, and uses AI (DeepSeek) to generate strategic insights. It produces daily intelligence reports that help you identify the best opportunities, understand hidden constraints, and choose winning strategies.

In simple terms:

  1. Scrapes hackathon listings from multiple sources
  2. Extracts clean, structured data from each hackathon's detail page
  3. Sends structured data to DeepSeek API for analysis
  4. Generates daily reports (console, JSON, markdown)
  5. Provides a clean web UI to browse the AI‑generated insights

✨ Features

  • Multi‑source scraping – Supports DoraHacks, ETHGlobal, Devpost, Gitcoin, Replit, Hugging Face, BuidlBox (modular architecture)
  • Detail‑page extraction – Opens each hackathon's detail page, extracts clean content, strips UI boilerplate
  • Section detection – Identifies logical sections (requirements, rules, prizes, timeline, eligibility, etc.)
  • Data normalization – Maps extracted content to a unified schema
  • AI analysis – Sends structured JSON to DeepSeek API for strategic insights (summary, hidden constraints, opportunity score, best strategy, etc.)
  • Daily intelligence reports – Generates console, JSON, and markdown reports, sorted by opportunity score
  • Change detection – Only processes new or updated hackathons (content‑hash based)
  • Local web dashboard – Clean, read‑only UI to browse the latest AI insights
  • Production‑ready – Logging, error handling, retries, timeouts, deduplication, SQLite storage

🚀 Quick Start

1. Clone the repository

git clone https://github.com/your-org/hakethon-hunter.git
cd hakethon-hunter

2. Create a virtual environment (recommended)

Linux/macOS:

python -m venv venv
source venv/bin/activate

Windows (Command Prompt):

python -m venv venv
venv\Scripts\activate.bat

3. Install dependencies

pip install -r requirements.txt
playwright install chromium

4. Set up your DeepSeek API key

  1. Copy the example environment file:
    cp .env.example .env
  2. Edit .env and add your DeepSeek API key:
    DEEPSEEK_API_KEY=your_api_key_here
    

Note: The API key is used in ai/deepseek.py. Without it, AI analysis will be disabled.

5. Run the pipeline (generate your first report)

python run.py

This will:

  • Scrape enabled platforms
  • Extract and normalize hackathon details
  • Send data to DeepSeek for analysis
  • Store results in SQLite
  • Generate reports in data/reports/

6. Launch the web dashboard

uvicorn ui.app:app --reload

Open your browser to:

http://127.0.0.1:8000

The dashboard shows the latest report, sorted by opportunity score, with clean cards for each hackathon.

📁 Project Structure

hakethon‑hunter/
├── scrapers/          # Platform‑specific list scrapers (DoraHacks, ETHGlobal, etc.)
├── extractor/         # Section extraction & normalization
├── ai/                # DeepSeek API integration and prompts
├── pipeline/          # Main pipeline & scheduler (APScheduler)
├── storage/           # SQLite persistence & models
├── reports/           # Report generation (console, JSON, markdown)
├── ui/                # Web dashboard (FastAPI + Jinja2)
│   ├── app.py         # FastAPI backend
│   ├── templates/     # HTML templates
│   └── static/        # CSS & assets
├── logs/              # Structured logs
├── data/              # Raw HTML, normalized JSON, generated reports
├── config.py          # Central configuration
└── run.py             # Entry point for one‑time execution

🧠 How It Works (Simple Flow)

  1. Scraping – Each enabled platform’s scraper fetches a list of current hackathons.
  2. Detail extraction – For each hackathon, the detail page is downloaded and cleaned.
  3. Section detection – The cleaned text is split into logical sections (requirements, prizes, timeline, etc.).
  4. Normalization – Sections are mapped to a unified schema.
  5. AI analysis – The structured data is sent to DeepSeek with a custom prompt that asks for strategic insights (opportunity score, difficulty, best strategy, risks, etc.).
  6. Storage – Results are saved to SQLite (hackathon details + AI analysis).
  7. Reporting – A daily report is generated in three formats: console (human‑readable), JSON (machine‑readable), and markdown (for sharing).
  8. Dashboard – The latest JSON report is loaded by the FastAPI UI and displayed as a clean, sortable card‑based dashboard.

Important: No raw HTML is sent to the AI. Only structured, normalized data is used for analysis.

⚙️ Configuration

Edit config.py to adjust:

  • Timeouts and retry counts
  • Output paths
  • Logging levels
  • Enabled platforms (ENABLED_PLATFORMS)

Environment Variables

Variable Description Example
DEEPSEEK_API_KEY DeepSeek Chat API key sk‑...
HTTP_TIMEOUT Request timeout in seconds 30
MAX_RETRIES Retry count for failed requests 3
LOG_LEVEL Logging level (DEBUG, INFO, WARNING) INFO
DB_PATH Path to SQLite database ./data/hackathons.db

🖥️ Web Dashboard

The UI dashboard (ui/) is a read‑only, independent web interface that displays the latest AI‑generated report.

Features:

  • Dark theme with clean card layout
  • Sorts hackathons by opportunity score (highest first)
  • Shows top 5 opportunities (configurable in code)
  • Displays key fields: name, score, difficulty, summary, idea‑to‑win, best strategy, risks
  • Auto‑refresh every 60 seconds (optional)
  • Health endpoint (/health)
  • No framework bloat – just FastAPI + Jinja2 + vanilla CSS/JS

Run the dashboard:

uvicorn ui.app:app --reload

📊 Output

Each pipeline run creates:

  1. Console report – Printed to stdout, sorted by opportunity score
  2. JSON reportdata/reports/report_YYYY‑MM‑DD.json (used by the dashboard)
  3. Markdown reportdata/reports/report_YYYY‑MM‑DD.md (easy to share)

Example JSON entry:

{
  "name": "ETHGlobal Istanbul 2024",
  "platform": "ethglobal",
  "url": "https://ethglobal.com/events/istanbul",
  "opportunity_score": 8.5,
  "difficulty": "medium",
  "relevance_to_ai_agents": 7,
  "summary": "A web3‑focused hackathon with $200k in prizes...",
  "best_strategy": "Build a cross‑chain interoperability tool...",
  "idea_to_win": "A privacy‑preserving identity solution...",
  "risks": "High competition, tight deadline"
}

🧩 Extending the Agent

Adding a new platform

  1. Create scrapers/newplatform.py:
    from .base import BaseScraper
    
    class NewPlatformScraper(BaseScraper):
        PLATFORM = "newplatform"
        LIST_URL = "https://newplatform.com/hackathons"
    
        def fetch_list(self):
            # Return list of {title, url, short_description, platform}
            ...
  2. Register it in scrapers/__init__.py.
  3. The pipeline automatically picks it up.

Customizing section detection

Edit extractor/sections.py to adjust keyword‑to‑section mapping.

Modifying AI prompts

Edit ai/prompts.py to change the system prompt sent to DeepSeek.

📝 Notes

  • The system is designed for daily automation (use pipeline.scheduler).
  • All scrapers respect robots.txt and implement polite delays.
  • The UI dashboard is completely independent; it only reads the JSON reports and does not modify any backend logic.
  • Reports are stored in data/reports/ – you can archive or share them as needed.

🤝 Collaboration & Contact

For collaboration, research, or partnerships:

Email: walid@vuneum.com


License: MIT

Happy hacking – may your opportunity score be high!

About

AI-powered hackathon intelligence agent (scraping + AI model analysis)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors