Full GOAP planner trace showing goal evaluation, A* search, Monte Carlo robustness gate, plan execution with cost tracking, and cost self-correction. Output from
brain/goap/planner.py. Timestamps, session IDs, and elapsed times are from a real session; the sequence below is one complete plan cycle.
When a plan is needed (previous plan completed or invalidated), the planner
evaluates all five goals against the current PlanWorldState. The most
insistent goal becomes the planning target.
world state: hp=62% mana=35% pet=alive engaged=no targets=3 threats=0 inv=22%
SURVIVE sat=0.62 insistence=0.38 (1.0 - 0.62) * 1.0
MAINTAIN_READINESS sat=1.00 insistence=0.00
MANAGE_RESOURCES sat=0.69 insistence=0.10 mana below 70% threshold
GAIN_XP sat=0.50 insistence=0.35 targets available, resources permit
MANAGE_INVENTORY sat=1.00 insistence=0.00
selected: SURVIVE (insistence 0.38 > GAIN_XP 0.35)
SURVIVE wins because HP at 62% produces higher insistence than GAIN_XP at the same mana level. If HP were above 80%, GAIN_XP would dominate.
The planner runs forward A* from the current world state. Each node expands by trying all actions whose preconditions are met, applying their effects to produce child states. Duplicate adjacent actions (rest -> rest) are pruned. Search terminates when goal satisfaction reaches 0.70.
[GOAP] A* search for SURVIVE (sat=0.62, threshold=0.70)
node 0 depth=0 state: hp=62% mana=35% g=0.0 h=2.4 f=2.4
expand: rest (not engaged, no threats -> met)
expand: acquire (targets>0, mana>25%, pet alive -> met)
expand: wander (targets=0? no -> skip)
expand: buff (buffs already active -> skip)
...
node 1 depth=1 state: hp=95% mana=80% g=18.0 h=0.0 f=18.0
action: rest cost=18.0s (corrected: base=18.0 + ema=0.0)
goal satisfaction: 0.95 >= 0.70 -- candidate found
nodes visited: 2 / 500 budget
search time: 0.3ms / 50ms budget
The heuristic h = (0.70 - sat) * 30.0 estimates remaining cost in seconds.
With sat=0.62, h = 0.08 * 30 = 2.4s. After rest, sat=0.95 and h drops to 0.
Before accepting the candidate plan, the planner runs 20 stochastic rollouts. Each rollout applies action effects with Gaussian noise drawn from learned posterior variance (encounter history). Plans that fail under noisy outcomes are rejected and the search continues.
[GOAP] MC evaluation: 20 rollouts, learned sigma hp=0.08 mana=0.11
rollout 1: rest -> hp=93% mana=74% sat=0.93
rollout 2: rest -> hp=97% mana=82% sat=0.97
...
rollout 20: rest -> hp=91% mana=78% sat=0.91
mean satisfaction: 0.94 (threshold: 0.50)
ACCEPTED -- plan robust under posterior uncertainty
If the planner had wider posteriors (fewer observations), sigma would be larger, more rollouts would produce low satisfaction, and marginal plans would be rejected. The gate naturally penalizes plans built on uncertain data.
The structured event emitted when a plan is accepted:
{"event":"goap_plan","ts":"2026-03-27T23:18:42.108-05:00","elapsed":1259,
"session_id":"session_20260327_225743","level":"INFO",
"logger":"compass.brain.goap.planner",
"goal":"SURVIVE","steps":1,"cost":18.0,"satisfaction":0.94,"plan_ms":0.3,
"msg":"[GOAP] Generated: Plan(SURVIVE: rest, cost=18.0s, sat=0.94) in 0.3ms"}As each step's routine runs to completion, the planner records estimated vs actual duration. The example below shows a longer plan (GAIN_XP, 4 steps) to demonstrate multi-step cost tracking:
{"event":"goap_step","step":"rest", "estimated_s":18.0,"actual_s":19.3,"error_s":+1.3}
{"event":"goap_step","step":"acquire", "estimated_s": 5.0,"actual_s": 4.6,"error_s":-0.4}
{"event":"goap_step","step":"pull", "estimated_s": 8.0,"actual_s": 3.7,"error_s":-4.3}
{"event":"goap_step","step":"defeat", "estimated_s":15.9,"actual_s":17.2,"error_s":+1.3}The defeat step uses a learned cost (15.9s from encounter history averages)
rather than the 25.0s default. Early sessions show larger errors; as the
encounter posterior tightens, estimated and actual converge.
Each step's error feeds an exponential moving average per action type (alpha=0.3). After 3+ observations, the corrected cost replaces the base estimate in future A* searches:
cost corrections after 8 plan cycles:
rest: base=18.0s ema_error=+1.1s corrected=19.1s (5 observations)
acquire: base= 5.0s ema_error=-0.3s corrected= 4.7s (5 observations)
pull: base= 8.0s ema_error=-3.8s corrected= 4.2s (5 observations)
defeat: base=15.9s ema_error=+0.9s corrected=16.8s (5 observations)
wander: base=30.0s ema_error=-8.2s corrected=21.8s (3 observations)
Corrected costs are floored at 10% of the base cost to prevent collapse. Pull's large negative correction reflects the heuristic overestimating pull time. Corrections persist across sessions via JSON.
Plans are invalidated when an emergency rule fires, the world state changes enough to break a step's preconditions, or the goal is satisfied early:
{"event":"goap_invalidate","ts":"2026-03-27T23:22:05.771-05:00",
"goal":"GAIN_XP","step":2,"steps_total":4,"reason":"emergency_flee",
"msg":"[GOAP] Plan invalidated: GAIN_XP at step 2/4 (reason: emergency_flee)"}After invalidation, the planner generates a fresh plan on the next routine completion. The cost corrections from the invalidated plan's completed steps are retained -- partial execution still improves cost accuracy.
{"event":"goap_complete","ts":"2026-03-27T23:19:47.295-05:00",
"session_id":"session_20260327_225743","level":"INFO",
"logger":"compass.brain.goap.planner",
"goal":"GAIN_XP","steps_total":4,"steps_executed":4,
"msg":"[GOAP] Plan completed: GAIN_XP (4/4 steps)"}