Skip to content

A tool by Antigravity for scraping, analyzing, and classifying YC companies from any batch. The tool automatically categorizes companies into core AI themes and maintains a structured database of company information.

License

Notifications You must be signed in to change notification settings

tmcleod3/yc-processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

YC Company Scraper and Analyzer

A Python-based tool for scraping, analyzing, and classifying YC companies from any batch. The tool automatically categorizes companies into core AI themes and maintains a structured database of company information.

Features

  • Scrapes company data from Y Combinator's company directory
  • Extracts detailed information including:
    • Company name and description
    • Location
    • Company website
    • LinkedIn profiles (company and founders)
    • YC batch information
  • Uses AI to analyze and classify companies into core themes:
    • AI Agents
    • Industry Transformation
    • AI Infrastructure
  • Syncs data to Notion database for easy viewing and analysis
  • Exports data to CSV format

Prerequisites

  • Python 3.9+
  • Notion API Key (for Notion integration)
  • OpenAI API Key (for AI analysis)

Installation

  1. Clone the repository:
git clone https://github.com/gadiborovich/yc-processor.git
cd yc-processor
  1. Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Create a .env file in the root directory with your API keys:
NOTION_API_KEY=your_notion_api_key
NOTION_DATABASE_ID=your_notion_database_id
OPENAI_API_KEY=your_openai_api_key
CLASSIFIER_PROMPT=your_classifier_prompt

Usage

  1. Run the main script:
python src/main.py

Additional flags:

  • --batch: Specify YC batch (e.g., "W25" or "S25")
  • --sync-notion: Sync data to Notion
  • --export: Export data to CSV
  • --export-file: Specify custom export filename
  1. Export to CSV:
python src/main.py --export
  1. Sync to Notion:
python src/main.py --sync-notion

Project Structure

├── src/
│   ├── analyzer/        # AI analysis and classification
│   │   ├── __init__.py
│   │   └── llm_analyzer.py
│   ├── config/         # Configuration management
│   │   ├── __init__.py
│   │   ├── config.yaml
│   │   └── config_manager.py
│   ├── exporter/       # CSV export functionality
│   │   ├── __init__.py
│   │   └── csv_exporter.py
│   ├── notion/         # Notion integration
│   │   ├── __init__.py
│   │   └── notion_sync.py
│   ├── scraper/        # Web scraping functionality
│   │   ├── __init__.py
│   │   ├── company_details.py
│   │   ├── company_scraper.py
│   │   └── url_discovery.py
│   ├── storage/        # Database models and management
│   │   ├── __init__.py
│   │   └── models.py
│   └── main.py         # Main entry point
├── .gitignore         # Git ignore rules
├── LICENSE           # MIT License
├── README.md         # This documentation
└── requirements.txt  # Python dependencies

Notion Database Setup

  1. Create a new Notion database with the following properties:
  • Name (title)
  • Description (text)
  • Core Themes (select)
  • Tags (multi-select)
  • Website (url)
  • Deck (files)
  • Company LinkedIn (url)
  • Founder LinkedIn (text)
  • Location (text)
  • Analysis Rationale (text)
  1. Share the database with your Notion integration
  2. Copy the database ID and add it to your .env file

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Disclaimer

This tool is for educational purposes only. Please respect Y Combinator's terms of service and rate limiting when using this tool.

About

A tool by Antigravity for scraping, analyzing, and classifying YC companies from any batch. The tool automatically categorizes companies into core AI themes and maintains a structured database of company information.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages