Harness Architecture

The AI harness—surrounding infrastructure, integrations, memory management—often matters more than the underlying model. The same model can score 78% with one harness and 42% with another.

Source: Kapoor (CORE-Bench/HAL); Baitch, The Model vs. the Harness.

Philosophy

Collaboration model: AI has access to local machine, manages state via structured artifacts, focuses on incrementalism.

Harness > model: Models converge; harnesses determine effectiveness.

Known Failure Modes

Inverted U-shaped performance: Agent performance is often worst at extremes (rare domains, clinical extremes, adversarial inputs). Adequate on common cases; failures at edges. Validate agent behavior at extremes before relying on it for high-stakes decisions.

Identify but advise wrong: Models can correctly identify risk in their reasoning but still advise the wrong action (e.g., triage to "wait" instead of emergency). Human gates required for high-stakes outputs.

What the Harness Is

Component	Role
Rules	Operating principles, tool limits, skill routing
Skills	Per-task instructions; JIT-loaded
state/	Institutional memory: handoff, decision-log, known-issues, preferences
Handoff	"Document then continue" — Done/Next, archive, continue prompt
MCP	Model Context Protocol servers for tools and resources

External benchmarks and sims (implementation-side)

Multi-agent sims, prompt-eval harnesses, and benchmark runners do not live in OpenHarness core. They belong in implementation repos (research notes, optional automation, MCP wrappers) per DELINEATION.md. The harness still points to them so operators know where runbooks and provenance live.

Examples (sibling-repo layout; adjust paths if your clone differs):

Topic	Canonical write-up
DECIDE-SIM stack placement (OpenGrimoire vs research vs local-proto)	portfolio-harness brainstorm
Paper note + meditation backlog	software research note
Promptfoo vs DECIDE-SIM (gap analysis)	promptfoo_vs_DECIDE_SIM_gap_analysis.md

Convention: Consume summaries + hashes + SCP on excerpts—not raw API dumps in core docs. Optional operator UI (e.g. OpenGrimoire) reads pre-computed JSON only; execution stays outside the Next app—see OPENGRIMOIRE_SYSTEMS_INVENTORY.md in a sibling portfolio-harness checkout.

Memory

Load order (new session):

intent_surface.md (if exists)
session_brief.md (if exists)
handoff_latest.md
preferences.md
rejection_log.md (if proposing similar work)
decision-log.md (if exists)
known-issues.md (if exists)
daily/YYYY-MM-DD.md (optional)
For async / multi-session work: ASYNC_HITL_SCOPE.md and state/async_tasks.yaml after handoff when latency_tolerance: async_ok or parallel sessions are possible.

See SESSION_BOOTSTRAP.md for the same sequence with links. See OPENHARNESS_CONTEXT_MAP.md for checklist → path mapping.

Agent-native maintenance

After large changes to .cursor/skills/ (new folders, major SKILL.md edits), run python scripts/verify_skills_readme.py and optionally an agent-native audit (.cursor/commands/agent-native-audit.md) so discovery and parity docs stay aligned.

Lock-in

This harness is Cursor-centric. Portable: state schema, handoff format, plan structure. Cursor-specific: .cursorrules, role-routing, MCP config.

Public vs private

OpenHarness is intended as a public reference: patterns and synthetic examples, not real operator handoffs or internal state. Keep real handoff_*, daily/, and credential-backed workflows in private repos. See PUBLIC_AND_PRIVATE_HARNESS.md and examples/HANDOFF_EXAMPLE_SYNTHETIC.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harness Architecture

Philosophy

Known Failure Modes

What the Harness Is

External benchmarks and sims (implementation-side)

Memory

Agent-native maintenance

Lock-in

Public vs private

FilesExpand file tree

HARNESS_ARCHITECTURE.md

Latest commit

History

HARNESS_ARCHITECTURE.md

File metadata and controls

Harness Architecture

Philosophy

Known Failure Modes

What the Harness Is

External benchmarks and sims (implementation-side)

Memory

Agent-native maintenance

Lock-in

Public vs private