Skip to content

mainly-ai/model-meta

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Model Meta Information Scraper

Scrapes model information from AI providers (currently: OpenAI).

Installation

git clone <repository-url>
cd model-meta
uv venv
source .venv/bin/activate
uv pip install -e .
python -m playwright install chromium

Usage

python scrape_models.py --provider openai

Options:

  • --openai-api-key <key> or set OPENAI_API_KEY
  • --timeout <ms> (increase timeout)
  • --debug (verbose logging)
  • --use-cache (use cached HTML)
  • --dry-run (no API calls)

Output: meta/openai.json with scraped model info.

How It Works

  1. Automates browser with Playwright
  2. Extracts model documentation links and HTML
  3. Uses GPT-4.1 (OpenAI API) to parse content
  4. Validates data with Pydantic models (model_type.py)
  5. Saves as JSON

Output Example

{
  "id": "openai",
  "friendly_name": "OpenAI",
  "models": [
    {
      "id": "gpt-4",
      "friendly_name": "GPT-4",
      "pricing": { "unit": "usd/1m_tokens", "input": 30.0, "output": 60.0 },
      "context_length": 8192,
      "max_output_tokens": 4096,
      "knowledge_cutoff_date": "2023-04-01",
      "capabilities": ["text_input", "text_output", "reasoning"]
    }
  ]
}

Adding Providers

  1. Add a new scraper class in scrapers/ (inherits ModelScraper)
  2. Implement scrape_models()
  3. Register in scrapers/__init__.py and scrape_models.py

Example:

from scrapers.base import ModelScraper

class NewProviderScraper(ModelScraper):
  def scrape_models(self):
    return {
      "id": "new-provider",
      "friendly_name": "New Provider",
      "models": [ /* ... */ ]
    }

About

Scrapes model information from AI providers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages