Harness Engineer

Turn prompt-heavy workflows into recoverable, validator-first harness projects with loop presets, task-family presets, and upgrade-friendly doctrine.

Why it matters · Before vs After · How it works · What you get · Project Preset Gallery · Examples · Quick Start · Decision Model · Contributing · Roadmap · Releasing

Why it matters

Most agent failures are not model failures. They are harness failures.

What actually breaks in practice:

the execution contract is vague
state lives only in chat memory
a loop tries to do too much in one pass
validation is weak or missing
the scaffold is too generic for the real task

harness-engineer exists to fix that. It helps Codex design the harness before it improvises one.

Before vs After

The shift is the whole point of the project:

Before: one giant prompt, hidden state, fuzzy boundaries, weak or missing validators
After: explicit docs, file-based state, bounded loop passes, validator-first progression

How it works in 3 steps

1. Freeze the contract
Clarify inputs, outputs, success criteria, the smallest verifiable unit, and what counts as failure before scaffolding anything.

2. Choose the shape
Pick a loop preset for runtime behavior and a project preset for task-family structure.

3. Generate and verify
Scaffold files, externalize state, run validators, and leave behind a harness that survives fresh-context restarts.

What you get

Doctrine Layer
Practical harness engineering guidance distilled from OpenAI, Anthropic, Ralph, OpenHarness, and hands-on local practice. Loop Presets
Control how the harness runs: baseline for general scaffolds, ralph-loop for resumable multi-pass execution. Project Presets
Control the work shape: batch processing, repo coding, research collection, or UI validation. Scaffold Engine
A modular Python generator that emits docs, progress state, manifests, validators, and runner placeholders.

Visual identity

The visual language mirrors the skill itself:

deep blue for structure and systems
green for validated forward motion
violet for loop orchestration and preset logic
amber for controlled evolution and caution points

Architecture poster

Project Preset Gallery

Examples / Use Cases

Example 1: Batch OCR and enrichment

Use:

Use $harness-engineer to scaffold a Ralph Loop project for OCR and post-processing on a folder of scanned documents.

Suggested shape:

--preset ralph-loop
--project-preset batch-processing

What you get:

bounded batch progression
tasks.json for mutable unit state
input/output/artifact directories
archive-friendly structure for final outputs

Example 2: Long-running code remediation

Use:

Use $harness-engineer to design a recoverable harness for fixing one codebase issue per pass.

Suggested shape:

--preset ralph-loop
--project-preset repo-coding

What you get:

feature or task state
codebase pattern memory
scoped feature-plan docs
runner + validator flow that supports incremental repair

Example 3: Research collection and synthesis

Use:

Use $harness-engineer to scaffold a research harness that gathers sources, stores evidence, and synthesizes findings over multiple passes.

Suggested shape:

--preset baseline or --preset ralph-loop depending on loop needs
--project-preset research-collection

What you get:

source manifest
evidence and findings separation
explicit research protocol
structure that discourages mixing raw notes with validated output

Example 4: UI work with browser evidence

Use:

Use $harness-engineer to scaffold a harness for browser-visible feature work with screenshot-based validation.

Suggested shape:

--preset ralph-loop
--project-preset ui-validation

What you get:

screenshot, trace, and verdict directories
UI validation reference doc
stronger prompt guardrails around rendered-state evidence

Project status

Current public state: main
Stability: validated across loop presets, runner variants, and all current project presets
Scope: one current skill, one historical snapshot, one modular scaffold engine
Evolution model: doctrine first, scaffold second, trigger text last

Quick start

1. Install the skill

Windows PowerShell

Copy-Item -LiteralPath .\skills\harness-engineer -Destination "$HOME\.codex\skills\harness-engineer" -Recurse -Force

macOS / Linux

mkdir -p ~/.codex/skills
cp -R ./skills/harness-engineer ~/.codex/skills/harness-engineer

2. Invoke the skill explicitly

Use $harness-engineer to clarify requirements and scaffold a robust harness project.

Typical prompts:

Use $harness-engineer to design a harness for a batch document-processing pipeline.
Use $harness-engineer to refactor this prompt-only workflow into a recoverable harness.
Use $harness-engineer to scaffold a Ralph Loop project for a multi-pass remediation task.

Decision model

The skill has two independent control surfaces.

Loop preset

This answers: How should the harness run?

Loop preset	Use it when	Typical result
`baseline`	one scaffolded harness is enough and no explicit repeated loop policy is needed yet	simple runner, validator, docs, progress file
`ralph-loop`	work advances in repeated passes and must survive fresh-context restarts	`PROMPT.md`, `tasks.json`, batch plan, Ralph runner, loop contract

Project preset

This answers: What shape should this work take?

Project preset	Best for	Adds
`generic`	task-agnostic scaffolds	no extra overlays
`batch-processing`	OCR, conversion, enrichment, bulk transforms	`input/`, `output/`, `artifacts/`, batch manifest, batch contract
`repo-coding`	incremental codebase work	`features.json`, codebase patterns, current feature plan
`research-collection`	source gathering and evidence synthesis	`sources/`, `notes/`, `findings/`, `evidence/`, source manifest
`ui-validation`	browser-visible work	`screenshots/`, `traces/`, `verdicts/`, UI verdict template

Scaffold script

The skill ships with a modular scaffold engine:

skills/harness-engineer/scripts/init_harness_project.py

Example: baseline

python .\skills\harness-engineer\scripts\init_harness_project.py .\output --project-name "Example Harness"

Example: Ralph Loop + batch processing

python .\skills\harness-engineer\scripts\init_harness_project.py .\output --project-name "Example Ralph Batch" --preset ralph-loop --project-preset batch-processing --batch-size 5

Useful flags

--preset baseline|ralph-loop
--project-preset generic|batch-processing|repo-coding|research-collection|ui-validation
--topology
--runner
--batch-size
--with-features-file
--with-failure-log
--with-archives

What gets generated

Baseline scaffold

AGENTS.md
config.yaml
progress.txt
docs/
scripts/
validator placeholder
summary placeholder

Ralph Loop scaffold

baseline scaffold
PROMPT.md
tasks.json
docs/exec-plans/current-batch-plan.md
logs/failure-log.jsonl
archives/
Ralph-style runner placeholder

Task-family overlays

batch-processing: batch manifest, pipeline dirs, archive bias
repo-coding: feature state, codebase patterns, current feature plan
research-collection: source manifest, evidence dirs, findings docs
ui-validation: verdict template, screenshot and trace dirs

Repository anatomy

harness-engineer-skill/
├── assets/                 # landing-page visuals and icon system
├── skills/
│   └── harness-engineer/
│       ├── SKILL.md
│       ├── agents/openai.yaml
│       ├── references/     # doctrine and decision rules
│       └── scripts/        # modular scaffold generator
├── snapshots/              # rollback and historical comparison
├── README.md
├── README.zh-CN.md
├── CONTRIBUTING.md
├── ROADMAP.md
├── RELEASING.md
└── versions.json

Included versions

Version	Path	Notes
Current	`skills/harness-engineer/`	Active release with Ralph Loop and project presets
Snapshot	`snapshots/harness-engineer-backup-20260408-161519/`	Backup from before the Ralph preset upgrade

Design lineage

This repository is an original synthesis shaped by:

OpenAI harness engineering ideas
Anthropic articles on long-running harnesses
snarktank/ralph
HKUDS/OpenHarness
distilled practitioner notes from real local use

It is not an official upstream release of any of those projects.

Core thesis

Better prompts help. Better harnesses survive.

The skill assumes:

state should live in files, not chat memory
validators matter more than optimistic self-reporting
topology should stay as small as possible
scaffolding should stay replaceable as models improve

Validation

The current skill has been validated with:

quick_validate.py against the skill itself
Python compile checks for every scaffold module
smoke tests for:
- baseline scaffold generation
- Ralph Loop scaffold generation
- generated validator execution
- generated Python, PowerShell, and Bash runners
- all current project preset overlays

Attribution

Human project owner and curator: repository maintainer
AI implementation and packaging support: OpenAI Codex

This repository uses explicit README attribution for Codex. If you also want Codex-like attribution inside commit metadata, use a dedicated co-author trailer or bot/account identity in future commits.

Project maintenance

Contribution guide: CONTRIBUTING.md
Roadmap: ROADMAP.md
Release process: RELEASING.md

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
skills/harness-engineer		skills/harness-engineer
snapshots/harness-engineer-backup-20260408-161519		snapshots/harness-engineer-backup-20260408-161519
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
RELEASING.md		RELEASING.md
ROADMAP.md		ROADMAP.md
versions.json		versions.json

Folders and files

Latest commit

History

Repository files navigation

Harness Engineer

Why it matters

Before vs After

How it works in 3 steps

What you get

Visual identity

Architecture poster

Project Preset Gallery

Examples / Use Cases

Project status

Quick start

1. Install the skill

2. Invoke the skill explicitly

Decision model

Loop preset

Project preset

Scaffold script

Example: baseline

Example: Ralph Loop + batch processing

Useful flags

What gets generated

Baseline scaffold

Ralph Loop scaffold

Task-family overlays

Repository anatomy

Included versions

Design lineage

Core thesis

Validation

Attribution

Project maintenance

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages