This file defines required behavior for any AI agent operating in this repository.
Produce independent, reproducible AI governance research artifacts that can be verified by third parties and cited publicly.
Primary outputs:
- OpenClaw case-study report (
openclaw-2026) - AI tool sprawl flagship report (
ai-tool-sprawl-q1-2026)
- Reproducibility over rhetoric.
- Every headline claim must map to artifact + deterministic query in
claims/.
- Independent research tone.
- No hype language.
- No claims without evidence.
- No product CTA language in report body sections.
- Explicit limitations and threats-to-validity sections are mandatory.
- Deterministic first.
- Treat deterministic baseline as canonical.
- Enrich-derived claims require explicit provenance (
as_of, source) and separate labeling.
- No production risk.
- OpenClaw execution must occur in isolated containerized lab conditions only.
- Never use production credentials, customer data, or unrestricted side effects.
- Publish gates are hard gates.
- A report does not publish unless validation and claim/threshold checks pass.
- Strategic context:
internal/WHY.md - OpenClaw definitions:
reports/openclaw-2026/definitions.md - OpenClaw protocol:
reports/openclaw-2026/study-protocol.md - OpenClaw preregistration:
reports/openclaw-2026/preregistration.md - Sprawl definitions:
reports/ai-tool-sprawl-q1-2026/definitions.md - Sprawl protocol:
reports/ai-tool-sprawl-q1-2026/study-protocol.md - Sprawl preregistration:
reports/ai-tool-sprawl-q1-2026/preregistration.md - Headline rubric:
internal/headline_rubric.md - Claim ledgers:
claims/*/claims.json - Threshold policy:
pipelines/config/publish-thresholds.json - Citation logs:
citations/*.md
If these files conflict with draft notes, control files win.
- Confirm definitions and protocol version.
- Confirm preregistration file exists and lock fields are set for the planned run.
- Confirm citation logs exist for timeline/regulatory assertions.
- Preflight run scaffold with
pipelines/*/run.sh --run-id <id> --dry-run. - Create immutable run scaffold with
pipelines/*/run.sh --run-id <id>(or--resumeonly for existing IDs). - Keep run-script options explicit in manifests (execution mode, workload mode, guardrails).
- Write artifacts to:
runs/openclaw/<run_id>/...runs/tool-sprawl/<run_id>/...
- Keep raw and derived outputs separated.
- Preserve exact policy/config snapshots used in run.
- Update claim values in the report claim ledger:
claims/openclaw-2026/claims.jsonorclaims/ai-tool-sprawl-q1-2026/claims.json
- Validate:
pipelines/openclaw/validate.shorpipelines/sprawl/validate.sh
- For publish readiness:
pipelines/openclaw/validate.sh --run-id <id> --strictorpipelines/sprawl/validate.sh --run-id <id> --strict
- Assemble package:
pipelines/openclaw/publish_pack.sh --run-id <id>orpipelines/sprawl/publish_pack.sh --run-id <id>
- Ensure bundle hash manifest exists (
bundle.sha256). - Confirm all links and artifact references are real and resolvable.
- OpenClaw Section 1 is facts-only timeline (max three paragraphs).
- OpenClaw Section 3 is brand-neutral data section (no product messaging).
- Sprawl report follows 10-section canonical structure.
- Gait deep analysis belongs in OpenClaw report; Sprawl references Gait only in recommendations context.
- No install commands, pricing language, or product CTA statements in report body sections.
- News/social incident references may be used only as context unless directly measured in run artifacts.
- Every report must include a headline integrity block with:
- headline number
- denominator
- run ID
- artifact path
- deterministic query
- Headline selection must follow
internal/headline_rubric.md(minimum score threshold). - Every report must include fixed methodological disclosure headings:
LimitationsThreats to ValidityResidual RiskReproducibility Notes
- Do not silently change metric definitions, threshold policy, or schemas.
- If changed:
- update version marker in definitions/protocol/schema docs
- rerun affected metrics
- update claims and cite change in commit message
- Backfilling narrative claims without data proof.
- Mixing enriched and deterministic metrics without clear labeling.
- Publishing production-write percentages without configured production-target policy.
- Treating non-
allowoutcomes as executable in governed lane logic. - Omitting limitations to make results look stronger.