Date: 2026-03-01 Purpose: Freeze scope, environment, credentials, verification, and agent guardrails so that an AI coding agent can begin autonomous execution of the Implementation Plan without thrashing on unresolved decisions.
The Implementation Plan describes a large surface area of tools. We maintain two test profiles:
- Agent-local (primary): The coding agent runs on the developer's machine with all system dependencies installed, all API keys populated, and network access enabled. The agent should build and test real integrations — not stubs. Skipping tests is a last resort, not the default.
- CI-minimal (fallback): GitHub Actions with no special system packages. Tests that require external services are skipped gracefully so CI stays green even when keys/tools aren't available.
The skip markers in tests/conftest.py exist as a safety net for CI, not as permission for the agent to skip work. When running locally, the agent should verify that integration tests actually execute (not skip).
| Component | Phase | Rationale |
|---|---|---|
Repository structure + pip install -e ".[dev]" |
0.1 | Build must install cleanly |
| CLAUDE.md, AGENTS.md, CONTRIBUTING.md | 0.2 | Agent context |
| Canonical docs index + core architecture doc | 0.3 | Living documentation |
| PERSONALITY.md + trust model | 0.4 | Persona and boundary definitions |
| POLICY.json (already exists) | 0.5 | Machine-enforced invariants |
| Model routing layer (with mock LLM calls) | 1.1 | Core abstraction — no real API keys needed for unit tests |
| Tool bus (MCP + CLI+JSON) with mock tools | 1.2 | Core abstraction — test with in-process mock MCP server |
| Persistence layer (SQLite) | 1.3 | Pure Python, no external deps |
| Tracing subsystem | 1.4 | Pure Python decorator + SQLite |
| Eval framework scaffolding | 1.5 | Pure Python runner + SQLite |
| Control plane primitives | 1.6 | Pure Python state machine |
| Campaign Registry MCP server | 1.7 | Wraps persistence; test with in-process MCP |
| Security subsystem (approvals, secrets, policy) | 2.1 | Pure Python |
| Hook system | 2.2 | Pure Python |
| Skill registry + base agent abstractions | 2.3 | Uses mock LLM (PydanticAI test mode) |
| Orchestrator agent (with stub subagents) | 2.4 | Mock LLM, mock tools |
| Computation agent (with mock CAS responses) | 2.5 | No real CAS needed |
| Literature agent (with mock API responses) | 2.6 | No real API calls needed |
| Verification agent (with mock data) | 2.7 | No real data needed |
| SymPy MCP server | 3.1 | Pure Python — always available |
| CAS template infrastructure | 3.4 | Pure Python template parsing |
| Parameter Scanner (NumPy/SciPy) | 3.5 | Already in core deps |
| Sandboxed Python runner | 3.6 | Process isolation, no Docker required for tests |
| Campaign config loader | 5.1 | Pure Python + Pydantic |
| Campaign state machine | 5.2 | Pure Python + SQLite |
| Gate pipeline runner (with mocks) | 5.3 | Mock CAS/tool responses |
| Adaptive sampling engine | 5.4 | Pure Python heuristics |
| CLI (all commands wired) | 6.1 | Typer — pure Python |
| Report generation | 6.2 | Pure Python |
These components have full protocol/interface definitions and mock implementations in tests, but integration tests are marked @pytest.mark.skipif when the system dependency isn't installed.
| Component | Phase | System Dependency | Skip Condition |
|---|---|---|---|
| Mathematica MCP server | 3.2 | Wolfram Engine + xAct | shutil.which("wolframscript") is None |
| Cadabra MCP server | 3.3 | cadabra2 runtime (cadabra2 CLI) |
importlib.util.find_spec("cadabra2") is None and shutil.which("cadabra2") is None |
| arXiv MCP server (real API) | 4.1 | Node.js + npx | shutil.which("npx") is None or network unavailable |
| Semantic Scholar MCP | 4.2 | S2_API_KEY env var | os.getenv("S2_API_KEY") is None |
| INSPIRE-HEP connector | 4.3 | Network access | OPENEINSTEIN_SKIP_NETWORK_TESTS env var |
| NASA ADS connector | 4.4 | ADS_API_KEY env var | os.getenv("ADS_API_KEY") is None |
| CrossRef MCP | 4.5 | Network access | OPENEINSTEIN_SKIP_NETWORK_TESTS env var |
| Zotero integration | 4.6 | ZOTERO_API_KEY env var | os.getenv("ZOTERO_API_KEY") is None |
| GROBID PDF ingestion | 4.7 | Docker + GROBID container | shutil.which("docker") is None |
| LaTeX publishing toolchain | 4.8 | latexmk + TeX distribution | shutil.which("latexmk") is None |
| PhysBERT embeddings | Optional | torch + transformers | importlib.util.find_spec("torch") is None |
| pgvector | Optional | PostgreSQL + pgvector ext | Only installed via [pgvector] extra |
| JAX integration | Optional | jax + jaxlib | Only installed via [jax] extra |
| LangGraph orchestration | Optional | langgraph package | Only installed via [langgraph] extra |
| Docker sandboxing | Optional | Docker daemon | shutil.which("docker") is None; fall back to process isolation |
import pytest
import shutil
requires_wolfram = pytest.mark.skipif(
shutil.which("wolframscript") is None,
reason="Wolfram Engine not installed"
)
requires_npx = pytest.mark.skipif(
shutil.which("npx") is None,
reason="Node.js/npx not installed"
)
requires_network = pytest.mark.skipif(
os.getenv("OPENEINSTEIN_SKIP_NETWORK_TESTS", "0") == "1",
reason="Network tests skipped"
)
requires_docker = pytest.mark.skipif(
shutil.which("docker") is None,
reason="Docker not installed"
)
requires_latex = pytest.mark.skipif(
shutil.which("latexmk") is None,
reason="LaTeX not installed"
)Put these in tests/conftest.py so every test file can import them.
CI passes when: pytest exits 0, all non-skipped tests pass, and pip install -e ".[dev]" succeeds in a clean Python 3.12 environment with no special system packages.
The arXiv MCP server is listed as required: true in the example config, but for CI purposes, the arXiv integration test is skipped if npx is absent. The platform should handle missing optional MCP servers gracefully at runtime (log a warning, disable the tool, continue).
| Dependency | Version | How Provided | Notes |
|---|---|---|---|
| Python | 3.12+ | System install | Primary language |
| pip | Latest | Comes with Python | pip install -e ".[dev]" |
| uv | Latest | pip install uv |
Optional fast installer; pip is the baseline |
| pytest | ≥8.0 | [dev] extra |
Test runner |
| mypy | ≥1.10 | [dev] extra |
Type checking |
| ruff | ≥0.4 | [dev] extra |
Linting + formatting |
| git | System | Pre-installed | Version control |
| SQLite | 3.x | Bundled with Python | Persistence layer |
All dependencies in pyproject.toml |
As specified | pip install -e . |
NumPy, SciPy, SymPy, Pydantic, etc. |
| Dependency | Required For | Install Method | CI Available? |
|---|---|---|---|
| Node.js 18+ / npx | arXiv MCP server | System install | Add to CI if desired; otherwise skip |
| Docker | GROBID, sandbox hardening | System install | Not in default CI |
| LaTeX (texlive + latexmk) | LaTeX publishing | System install | Not in default CI |
| Wolfram Engine | Mathematica MCP server | Wolfram installer + license | Not in CI |
| xAct | Tensor algebra in Mathematica | Mathematica package manager | Not in CI |
| Cadabra | Optional CAS | System install (cadabra2 in PATH) |
Not in default CI |
The CI workflow (ci.yml) should be updated to:
name: ci
on:
push:
branches: ["**"]
pull_request:
env:
OPENEINSTEIN_SKIP_NETWORK_TESTS: "1"
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.12"]
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install
run: |
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"
- name: Lint
run: ruff check src/ tests/
- name: Type check
run: mypy src/openeinstein/ --ignore-missing-imports
- name: Run tests
run: pytest --tb=short -qFor full integration testing locally, the developer should have:
- Everything in CI above
- Node.js 18+ (for arXiv MCP via npx)
- Wolfram Engine (free for developers) + xAct (if running Mathematica tests)
- Docker (for GROBID and hardened sandboxing)
- LaTeX distribution (texlive-full or BasicTeX + latexmk)
- API keys in
.env(see §3)
# === LLM Provider Keys (at least one required) ===
ANTHROPIC_API_KEY= # For Claude models (reasoning, generation, fast)
OPENAI_API_KEY= # For GPT/o-series models (fallback or primary)
GOOGLE_API_KEY= # For Gemini models (optional additional provider)
# === Wolfram / Mathematica ===
WOLFRAM_APP_ID= # Wolfram Alpha API (optional, for quick lookups)
# Note: Wolfram Engine uses a local license, not an API key.
# Activate with: wolframscript -activate
# xAct is a free Mathematica package — no key needed, install via:
# git clone https://github.com/xAct-contrib/xAct ~/.Mathematica/Applications/xAct
# === Literature Tool Keys ===
S2_API_KEY= # Semantic Scholar API (free tier available at semanticscholar.org/product/api)
ADS_API_KEY= # NASA ADS API (free at ui.adsabs.harvard.edu/user/settings/token)
INSPIRE_API_KEY= # INSPIRE-HEP (optional — most endpoints are keyless)
ZOTERO_API_KEY= # Zotero Web API (from zotero.org/settings/keys)
ZOTERO_USER_ID= # Zotero numeric user ID
# === Optional Service Keys ===
CROSSREF_MAILTO= # CrossRef polite pool (your email, not a secret but improves rate limits)
# === Build/Test Flags ===
OPENEINSTEIN_SKIP_NETWORK_TESTS=0 # Set to 1 to skip tests requiring network
OPENEINSTEIN_LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR| Phase | Keys Needed | Can Tests Run Without? |
|---|---|---|
| 0 (Bootstrap) | None | Yes — no LLM calls |
| 1 (Core Framework) | None | Yes — all tests use mocks |
| 2 (Agents + Security) | None for unit tests; ANTHROPIC_API_KEY or OPENAI_API_KEY for smoke tests | Unit tests: yes. Smoke tests: need 1 key |
| 3 (CAS + Numerical) | Wolfram Engine license (local) for Mathematica tests | SymPy tests: yes. Mathematica tests: skip if absent |
| 4 (Literature) | S2_API_KEY, ADS_API_KEY for integration tests | Skipped if keys absent |
| 5 (Campaign Engine) | Same as Phase 2 | Unit tests: yes |
| 6 (CLI + Reports) | Same as Phase 2 | Yes for structure tests |
| 7 (Integration) | All keys for full E2E | Partial run possible with subset |
- arXiv API: No key needed (rate-limited by default)
- INSPIRE-HEP REST API: No key for most endpoints
- CrossRef: No key, but a
mailtoparameter gets you into the polite pool - SymPy: Free, pure Python
- Cadabra: Free, open source
- xAct: Free Mathematica package
- Wolfram Engine: Free for developers (requires account + activation)
The implementation plan depends on the agent being able to run verification after every task. Here is exactly what must work:
# After every task:
pytest tests/unit/test_<component>.py # Unit tests for the component
pytest tests/ # Full test suite (must not regress)
mypy src/openeinstein/<module>/ # Type checking per module
ruff check src/ tests/ # Lint check
# After every phase boundary:
pytest tests/integration/ # Integration tests
pip install -e ".[dev]" && pytest # Clean install + full suite
# For specific verification:
python -c "from openeinstein.<module> import <Class>" # Import smoke test
openeinstein --help # CLI smoke test (after Phase 6)The agent commits after each task passes its acceptance criteria:
# Pattern for each task:
git add <specific files>
git commit -m "Task X.Y: <description>
- Acceptance criteria: <what was verified>
- Tests: pytest tests/unit/test_<component>.py passes
- Type check: mypy src/openeinstein/<module>/ passes"pytestexit code 0- All non-skipped tests pass
- Skipped tests are explicitly marked with the skip markers from
conftest.py - No new warnings in
mypyoutput (use--ignore-missing-importsfor third-party libs without stubs) ruff checkpasses (auto-fixable issues can be fixed withruff check --fix)
At plan steps 3, 4, and 6 (per the 15-step dev loop in §17.2), the agent should do a fresh-context check. In practice for an AI coding agent, this means:
- Re-read CLAUDE.md and the relevant canonical doc
- Re-read the task's acceptance criteria
- Run the full test suite
- Verify the integration contract (imports work, interfaces match)
The following instructions should be included at the top of the agent's session context (in CLAUDE.md or as a system prompt prefix) when kicking off the build.
# OpenEinstein Build Agent Instructions
You are executing the OpenEinstein Implementation Plan (BUILD-READY.md +
OpenEinstein-Implementation-Plan.md). Follow these rules strictly:
## Execution Order
1. Execute tasks in the order specified in §19 (Sequential Build Order).
2. Do NOT jump ahead to a later phase while earlier tasks have failing tests.
3. Within a phase, tasks marked ∥ (parallel) may be done in any order,
but all tasks in a phase must pass before starting the next phase.
## Commit Discipline
4. Commit after each task passes its acceptance criteria.
5. Each commit message references the task ID (e.g., "Task 1.3: Implement
persistence layer").
6. Run `pytest` before every commit. Do not commit if tests fail.
7. Run `ruff check src/ tests/` and `mypy src/openeinstein/` before
committing. Fix issues before committing.
## Dependency Handling
8. The local development environment should have all dependencies installed
(run scripts/setup-dev-environment.sh first). The agent should:
a. Implement the interface/protocol fully.
b. Write unit tests with mock/stub implementations.
c. Write integration tests that call the REAL service/tool.
d. Run integration tests locally — they should PASS, not skip.
e. Integration tests also have skip markers (conftest.py) so CI stays
green, but locally they must execute.
f. If a dependency is genuinely missing locally, STOP and report it
rather than silently skipping. The developer should install it.
## Testing Requirements
9. Every new module gets unit tests in tests/unit/.
10. Every cross-module interaction gets integration tests in tests/integration/.
11. Use pytest fixtures and conftest.py for shared test infrastructure.
12. Mock all external services (LLM APIs, MCP servers, network calls) in
unit tests. Never make real API calls in unit tests.
13. Integration tests that require real services use skip markers.
## Architecture Guards
14. Never hardcode model names or provider names in business logic.
Use logical roles ("reasoning", "generation", "fast", "embeddings").
15. Never call MCP servers or CLI tools directly from agent/campaign code.
Route everything through ToolBus.
16. Never put physics-subfield-specific logic in core platform modules.
That belongs in campaign-packs/.
17. All Pydantic models go at module boundaries. Use typed signatures.
## Quality Gates
18. Before starting each phase, re-read the relevant sections of
CLAUDE.md and the implementation plan.
19. After completing each phase, run the full test suite and verify
zero regressions.
20. If you encounter a bug in a previously-completed task while working
on a new one, fix it immediately, add a regression test, and update
the Living Error Log in CLAUDE.md.
## Stop Conditions — When to STOP and ask for help
21. If more than 3 consecutive test runs fail on the same issue, STOP
and report the problem.
22. If a task's acceptance criteria are ambiguous or contradictory, STOP
and ask for clarification.
23. If you discover that the architecture needs a fundamental change
(not just a bug fix), STOP and propose the change before implementing.
24. If `pip install -e ".[dev]"` fails in a clean environment, STOP
immediately — the dependency chain is broken.
25. If total test count drops (tests are being deleted rather than fixed),
STOP and investigate.The existing CLAUDE.md should be amended with a new section:
## Build Execution Rules
See BUILD-READY.md for the complete pre-development readiness document, including:
- Core vs. plugin scope (what must pass CI vs. what can be skipped)
- Build environment contract (what tools are available)
- Credentials and secrets (.env structure)
- Verification loop (what commands to run after every task)
- Stop conditions (when to halt and ask for help)
The agent MUST read BUILD-READY.md before beginning any implementation work.Before handing the repo to the coding agent, verify:
- CLAUDE.md updated with pointer to BUILD-READY.md
- BUILD-READY.md (this file) committed to repo root
- .env.example updated with all key placeholders (see §3 above)
- .env created locally with at least
ANTHROPIC_API_KEYorOPENAI_API_KEYfilled in - ci.yml updated to skip network tests and include lint/type-check steps
- tests/conftest.py created with skip markers for optional dependencies
- Python 3.12 confirmed installed:
python3 --version - pip install -e ".[dev]" succeeds in the repo
- pytest runs and exits 0 (currently 0 tests, 0 failures)
- git initialized with clean working tree
- Node.js 18+ installed:
node --version - Wolfram Engine activated:
wolframscript -code '1+1'returns2 - xAct installed:
wolframscript -code 'Needs["xActxTensor"]; Print["ok"]' - Docker running:
docker ps - LaTeX installed:
latexmk --version - API keys populated in
.env