General coding guidelines (style, testing, error handling, etc.) are in ~/.claude/CLAUDE.md.
This file covers repo-specific context only.
- Python: 3.11+ (see
pyproject.toml) - Formatter: Ruff with 120-char line length (not Black)
- Testing: Pytest +
pytest-asyncio; all tests are self-contained (no API keys needed in CI)
This is a Metaculus forecasting bot forked from the metaculus starter template. It uses model ensembling, plus research integration through AskNews, native search (Grok), and fallback providers.
main.py: Primary bot implementation usingforecasting-toolsframeworkbacktest.py: Primary benchmarking system — scores bot predictions against actual resolutionscommunity_benchmark.py: DEPRECATED benchmarking CLI (community prediction baseline broken)metaculus_bot/: Core utilities including LLM configs, prompts, and research providersREFERENCE_COPY_OF_forecasting_tools*/: Read-only reference copy of the forecasting-tools framework source. Edits here won't affect the installed package. Path varies by machine — search forREFERENCE_COPY_OF_forecasting_tools*in the repo root or workspace if not found.- A reference copy of the Q2 2025 competition winner (panchul) may exist in the workspace (
REFERENCE_COPY_OF_panchul*). Good ideas for comparison. - A Metaculus API doc (
metaculus_api_doc_LARGE_FILE.yml) may exist inscratch_docs_and_planning/. Large file — use offset/limit when reading.
The bot architecture follows these key components:
- Model Ensembling: Multiple LLMs configured in
metaculus_bot/llm_configs.pywith aggregation strategies - Research Integration: AskNews, native search (Grok), and fallback providers through
research_providers.py - Forecasting Pipeline: Question ingestion → research → reasoning → prediction extraction → aggregation
tests/: Pytest suite (tests/test_*.py)..github/workflows/: CI (lint + test on PRs) and scheduled bot runs..env.template: Reference for required environment variables.
- Copy
.env.templateto.envfor local development - See
.env.templatefor required API keys. Never commit secrets to repository.
- Conda environment:
metaculus-bot - Python binary:
~/miniconda3/envs/metaculus-bot/bin/python - Direct execution: Use the full python path when conda commands fail
- Example:
~/miniconda3/envs/metaculus-bot/bin/python script.pyinstead ofconda run -n metaculus-bot python script.py - NEVER use pip directly — dependencies are managed by conda + poetry. Use
make installorpoetry installwithin the conda env.
The project heavily uses forecasting-tools framework:
GeneralLlmfor model interfacesMetaculusApifor platform integration- Question types:
BinaryQuestion,NumericQuestion,MultipleChoiceQuestion - Prediction types:
ReasonedPrediction,BinaryPrediction, etc. - Research:
AskNewsSearcher,SmartSearcherfor information gathering
LLM ensemble configured in metaculus_bot/llm_configs.py — see that file for current models.
Models rotate frequently; do not hardcode model names outside of llm_configs.py.
Provider: OpenRouter (with automatic key fallback).
- Install:
conda run -n metaculus-bot poetry install(ormake install) - Activate environment:
conda activate metaculus-bot
- Run bot:
conda run -n metaculus-bot poetry run python main.py(ormake run) - Run tests:
conda run -n metaculus-bot poetry run pytest(ormake test)
Primary approach — resolved-question backtest (backtest.py):
Scores bot predictions against actual question resolutions. This is the preferred benchmarking method.
- Smoke test (4 questions):
make backtest_smoke_test - Small (12 questions):
make backtest_small - Medium (32 questions):
make backtest_medium - Large (100 questions):
make backtest_large
DEPRECATED — community benchmark (community_benchmark.py): Baseline scoring broken (Metaculus removed aggregations from list API). make benchmark_display still works for viewing old results.
- Lint:
make lint(Ruff check) - Format:
make format(Ruff format + autofix) - Pre-commit:
make precommit_installthenmake precommitormake precommit_all - Test single file:
conda run -n metaculus-bot PYTHONPATH=. poetry run pytest tests/test_specific.py
The Makefile has most commands — e.g. make test, make format, make run. In agentic CLIs you may need to use the full python path (~/miniconda3/envs/metaculus-bot/bin/python) since conda activation can be unreliable.
- Commits: concise, imperative subject (e.g., "fix test cmd", "add conda to make"). Add a short body when context helps.
- PRs: clear description, link issues, include config/docs updates, and screenshots/logs for behavior changes.
- CI: all checks pass; code formatted and imports sorted.
- API docs: https://www.metaculus.com/api/ (Swagger UI)
- Backend source: https://github.com/Metaculus/metaculus (open-source, validation lives in
questions/serializers/common.py) - CDF constraints (server-side, for
continuous_cdfsubmissions):- Length:
inbound_outcome_count + 1(default 201) - Min step per bin:
round(0.01 / N, 9)(default 5e-5) — no flat segments allowed - Max step per bin:
0.2 * 200 / N(default 0.2) — spikiness cap - Closed bounds:
cdf[0] == 0.0,cdf[-1] == 1.0 - Open bounds:
cdf[0] >= 0.001,cdf[-1] <= 0.999 - Strictly increasing (implied by min step > 0)
- Length:
- Copy
.env.templateto.env; never commit secrets. - Use GitHub Actions secrets for
METACULUS_TOKENand API keys (AskNews, Perplexity, Exa, etc.). - Limit changes to workflow files unless CI behavior is intended to change.