Pawl: sandbox instrumentation through search

Pawl is an exploded view of the macOS Seatbelt sandbox built with search. It operates through mechanical exercise of the operating system, pulling the same policy along two directions: forward, from profile records through policy bytes to runtime denial; and reverse, from compiled blobs back to inputs—across multiple lanes and recording where they align or diverge.

Seatbelt policy is enforced through a confounded, censored stack: multiple layers contribute to outcomes. Closing the loop from structure → meaning → runtime behavior is hard even when you have good tools and strong priors. Furthermore, the reward for concerted effort is a fragile, host-bound understanding that can be undone with the next OS update. Rebuilding that understanding involves a new reverse engineering effort.

Pawl is different on purpose: a unitary inspectable account with enough internal wiring that it can notice when it stops being true. Instead of re-fitting reverse engineering tools to macOS when it updates, then exploring the Seatbelt, we instrument policy from the inside out. We find out what is different about a new version of macOS the same way we learn this one: each step in pawl is instrumented and diffed against system outputs. We converge on system features by keeping those outputs in productive doubt while testing new configurations.

Where we are today

The IR is operational. The evidence layer extracts byte-faithful, hash-stable structure from compiled profiles with explicitly marked unknown regions. The derived policy layer interprets evidence into operations, predicates, and control flow while retaining provenance links (origin_offsets, origin_spans) back to underlying bytes. Both layers are stable and exercised across synthetic fixtures and corpus profiles.

The five-point harness validates policy equivalence through roundtrip: source SBPL compiles to a blob, reverse renders back to SBPL, which recompiles to a roundtrip blob, with equality checks at each stage. The regex harness is mathematically precise—DFA equivalence plus a runtime file-read oracle that validates without trusting static comparisons. The profile harness has structural checks working (op-table classification, graph comparison); runtime checks are implemented via PolicyWitness but blocked for entitled or message-filter profiles.

Compiler-implicit operations are fully handled. A baseline artifact tracks ~54 implicit allows (deny-default profiles) and ~5 implicit denies (allow-default profiles); the reverse lane suppresses these, guarded by a dedicated test suite. Message-filter framing (0x4000 variant) is structurally decoded—allow-inner forms are canonicalized away (the compiler discards them), deny-inner forms reconstruct correctly. The remaining roundtrip gap is multi-regex predicate ordering in rules with two regex filters.

Known capability boundaries are characterized and scoped. Unicode casefolding is a runtime rule not encoded in bytecode—the harness expands it and tags the rewrite. POSIX bracket classes compile but do not match at runtime; the harness marks them unsupported. Message-filter profiles require a private entitlement (com.apple.private.security.message-filter) for runtime application; structural comparisons proceed, runtime checks report blocked with reason.

Five-point harness

Pawl's core validation structure is the five-point harness: a roundtrip circuit that closes only when multiple independent paths agree, making failures localizable instead of mysterious. The regex harness (integration/ir/regex/) is the tightest instantiation: DFA equivalence is mathematically precise, and the runtime file-read checker provides a genuinely independent oracle. The profile harness (integration/ir/profile/) extends this pattern to full sandbox profiles, with op-table classification and graph comparison replacing DFA checks.

     1 ◄═══ SBPL eq ═══► 3
  Source              Reversed
     │                 ↗  │
     │          reverse   │ compile
     ▼               ╱    ▼
     2 ─────────────╯     4
  Source ◄═══ DFA eq ═══► Roundtrip
   blob                    blob
     │                      │
     └──────────┬───────────┘
                │
                ▼
                5
             Oracle

Points 1–4 form a square of artifacts: source SBPL compiles to source blob, reverse renders back to SBPL, which recompiles to the roundtrip blob. The horizontal edges are cross-checks—SBPL surface equality above, DFA equivalence below. Point 5 is the runtime oracle, which validates behavior without trusting any of the static comparisons. When the circuit disagrees, the location of the disagreement tells you which transformation is at fault.

Level	What it catches
Surface equality	Normalization drift, parser bugs
Structural hash	Op-table divergence, node graph differences
DFA equivalence	Language acceptance mismatches
Runtime oracle	Host disagrees despite internal consensus

Everything is scoped to a host via world_id.

Docs index

Core loop

pawl/README.md — workbench overview and CLI
pawl/LOOP.md — cross-check and solver closure contract
pawl/casefile/CASEFILE.md — casefile schema and receipts

IR and harnesses

integration/ir/ir_schema.md — IR specification (evidence + policy layers)
integration/ir/PLAN.md — IR development plan and status
integration/ir/regex/FIVE_POINT_HARNESS.md — regex harness spec
integration/ir/profile/FIVE_POINT_HARNESS.md — profile harness spec

Runtime

runtime/PolicyWitness.md — runtime oracle and probe execution
probe/SPEC.md — probe plan schema

Fixtures and evidence

integration/fixtures/pawl/d-ring/ — synthetic fixture families for compiler behavior
integration/carton/README.md — frozen mappings and reviewable contracts

Supporting

api/README.md — Ghidra/Frida surfaces
tools/README.md — host-bound CLIs

Agent instructions live in local AGENTS.md files; closer is more authoritative.

Welcome aboard

The project has many moving parts; static artifacts, decoders, mappings, harnesses, probes, and their surrounding macOS environment all interact. It is a living, interdependent mesh, reaching into the agents who work on it as intimately as it does the sandbox.

Your part in this tangle may be to glue some of these parts together or to unstick them, depending. Trace inputs and outputs across the boundary you are touching, and update the generator or contract that defines the interface rather than patching artifacts directly. Leave a clear, rerunnable path from source to output so the wiring stays explicit for the next change. That is how to proceed generally; in particular, proceed first by reading CONTRIBUTING.md and AGENTS.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pawl: sandbox instrumentation through search

Where we are today

Five-point harness

Docs index

Welcome aboard

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 755 Commits
api		api
integration		integration
pawl		pawl
probe		probe
projections		projections
runtime		runtime
tools		tools
world		world
.gitignore		.gitignore
AGENTS.md		AGENTS.md
COLD-START.md		COLD-START.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pytest.ini		pytest.ini

License

Protonk/PAWL

Folders and files

Latest commit

History

Repository files navigation

Pawl: sandbox instrumentation through search

Where we are today

Five-point harness

Docs index

Welcome aboard

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages