Design same-session polecat recovery after refinery failure

## Problem

Today Gastown's polecat/refinery flow preserves **work identity** across a failed merge, but not **polecat session identity**.

Current behavior:

- a polecat finishes work and hands the bead to refinery
- refinery merges or rejects
- on rejection, the work returns to the pool
- a new polecat session may pick it up and resume from metadata

That works mechanically, but it loses the live context of the original polecat session. For merge failures and test failures, we want a new feature that lets the **same polecat session** pick the work back up when possible, instead of always relying on a fresh pool claim.

## Goal

Design and implement a same-session refinery feedback flow for Gastown:

- a polecat can submit work to refinery
- if refinery needs fixes, the same polecat session can resume the work
- if refinery succeeds, the polecat can clean up normally

This is a feature request, not just an upstream parity port. The design should match Gas City's wait/session/pool model.

## Current Limitation

We already identified one important blocker in the current architecture:

- the new wait subsystem is the right foundation for "polecat waits for refinery"
- but pooled polecats can still fall out of desired state and be drained when pool sizing shrinks
- so a waiting polecat is not yet a first-class pinned pool member

That means the feature needs real lifecycle design, not just prompt tweaks.

## Desired Outcome

Allow a polecat to remain the owner of its in-flight work across refinery verdicts when feasible.

In particular:

- failed merges should be resumable by the same polecat session
- successful merges should let the polecat finish cleanly without leaking sessions or worktrees
- if the original polecat is gone, unhealthy, or no longer recoverable, the system should still fall back safely to the existing pool-based recovery path

## Design Areas

### 1. Verdict / handoff model

- How does refinery communicate `FIX_NEEDED` vs success?
- Is the wake trigger the original work bead, a verdict bead, a wait object, or some other durable receipt?
- What is the durable source of truth for the refinery verdict?

### 2. Polecat session lifecycle

- Does the polecat sleep and wait for refinery instead of `drain-ack`?
- How is the same session woken back up?
- What is the timeout / abandonment story if refinery never replies?

### 3. Pool slot semantics

- How do waiting polecats interact with pool desired-state reconciliation?
- Do waiting polecats pin a pool slot?
- How do we avoid slot leakage or dead capacity when many polecats are waiting on refinery?

### 4. Recovery / fallback

- If the original polecat session dies, can another polecat still resume the work from metadata?
- What metadata must still be recorded for crash recovery?
- What happens across controller restart, compaction, or workspace cleanup?

### 5. Cleanup rules

- When is the worktree kept vs deleted?
- When does a successful merge retire the polecat?
- How do we avoid leaving stale waiting sessions behind?

## Non-Goals

- Do not require ACP propulsion work for this feature.
- Do not force the full upstream event model if a simpler Gas City-native contract is better.
- Do not break the current pool-based fallback path.

## Minimum Acceptance Criteria

- A failed refinery merge can be resumed by the same polecat session when that session is still healthy.
- Pool reconciliation does not accidentally kill a polecat just because it is waiting on refinery.
- If the original polecat is unavailable, the existing metadata-based fallback path still works.
- Successful merges still result in correct cleanup.
- The lifecycle contract is documented and covered by tests.

## Open Questions

- Should this be implemented directly with `gc session wait`, or does it need a narrower polecat/refinery-specific primitive?
- Should wait-held pooled agents count as desired capacity automatically?
- What is the right durable verdict artifact?
- Should the original polecat remain assigned to the work, or should assignment move while identity is tracked separately?

## Reference

- `docs/archive/analysis/gastown-upstream-audit.md`
  Delta 4: event-driven polecat/refinery lifecycle


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design same-session polecat recovery after refinery failure #22

Problem

Goal

Current Limitation

Desired Outcome

Design Areas

1. Verdict / handoff model

2. Polecat session lifecycle

3. Pool slot semantics

4. Recovery / fallback

5. Cleanup rules

Non-Goals

Minimum Acceptance Criteria

Open Questions

Reference

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Design same-session polecat recovery after refinery failure #22

Description

Problem

Goal

Current Limitation

Desired Outcome

Design Areas

1. Verdict / handoff model

2. Polecat session lifecycle

3. Pool slot semantics

4. Recovery / fallback

5. Cleanup rules

Non-Goals

Minimum Acceptance Criteria

Open Questions

Reference

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions