Scrapes model information from AI providers (currently: OpenAI).
git clone <repository-url>
cd model-meta
uv venv
source .venv/bin/activate
uv pip install -e .
python -m playwright install chromiumpython scrape_models.py --provider openaiOptions:
--openai-api-key <key>or setOPENAI_API_KEY--timeout <ms>(increase timeout)--debug(verbose logging)--use-cache(use cached HTML)--dry-run(no API calls)
Output: meta/openai.json with scraped model info.
- Automates browser with Playwright
- Extracts model documentation links and HTML
- Uses GPT-4.1 (OpenAI API) to parse content
- Validates data with Pydantic models (
model_type.py) - Saves as JSON
{
"id": "openai",
"friendly_name": "OpenAI",
"models": [
{
"id": "gpt-4",
"friendly_name": "GPT-4",
"pricing": { "unit": "usd/1m_tokens", "input": 30.0, "output": 60.0 },
"context_length": 8192,
"max_output_tokens": 4096,
"knowledge_cutoff_date": "2023-04-01",
"capabilities": ["text_input", "text_output", "reasoning"]
}
]
}- Add a new scraper class in
scrapers/(inheritsModelScraper) - Implement
scrape_models() - Register in
scrapers/__init__.pyandscrape_models.py
Example:
from scrapers.base import ModelScraper
class NewProviderScraper(ModelScraper):
def scrape_models(self):
return {
"id": "new-provider",
"friendly_name": "New Provider",
"models": [ /* ... */ ]
}