Skip to content

Merge dev into main#8

Merged
vkehfdl1 merged 18 commits intomainfrom
dev
Mar 26, 2026
Merged

Merge dev into main#8
vkehfdl1 merged 18 commits intomainfrom
dev

Conversation

@vkehfdl1
Copy link
Copy Markdown
Contributor

@vkehfdl1 vkehfdl1 commented Mar 22, 2026

Summary

  • preserve issue discussions by resuming the original OMX session instead of starting a detached follow-up flow
  • harden PR review orchestration by preventing duplicate review rounds, closing round-dedupe races, and aligning final verdict verification
  • prevent orphaned tmux sessions after OMX jobs finish and add coverage for the OMX runner/service paths

Issues closed by merging this PR

What changed

Session continuity

  • keep OMX session context attached to the original issue flow
  • clean up the resume-session implementation to satisfy repository formatting rules

Review pipeline hardening

  • block duplicate review rounds from advancing the PR chain
  • close race conditions around round dedupe
  • tighten final verdict verification behavior

Runner and service reliability

  • clean up OMX job/session lifecycle to avoid orphaned tmux sessions after execution
  • add supporting model/storage/github/service updates needed for the new flow

Files touched

  • dani/service.py
  • dani/omx_runner.py
  • dani/storage.py
  • dani/github.py
  • dani/models.py
  • dani/prompts.py
  • tests/test_service.py
  • tests/test_omx_runner.py
  • tests/helpers.py
  • README.md

Diff at a glance

  • 10 files changed
  • 656 insertions / 42 deletions

Commits included

vkehfdl1 added 13 commits March 22, 2026 12:04
Issue follow-up comments now reuse the original issue_request Codex/OMX
session instead of restarting analysis. The runner captures a resumable
OMX session id from Codex session artifacts, stores it on the launched
session, and uses it later to resume the same discussion thread.

Constraint: General issue comments must keep /approve implementation behavior unchanged
Constraint: Resume lookup should reuse stored session state instead of inventing a parallel ledger
Rejected: Start a fresh issue_request session for every follow-up | loses discussion continuity and repeats analysis
Rejected: Store only tmux session identifiers | not sufficient for `omx resume`
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Any new resumable stages should use storage session lookup plus stored omx_session_id, not prompt-text heuristics alone
Tested: uv run pytest -q; uv run ruff check .; uv run ty check; uv run deptry .
Not-tested: Live resume behavior against a real long-running OMX issue session in this branch
The Feature/#1 implementation was functionally correct, but CI still ran the
repository quality gate and failed on automatic Ruff formatting. This commit
applies the required formatting without changing behavior so the branch can
merge cleanly.

Constraint: Behavior and verification outcomes from the Feature/#1 implementation must remain unchanged
Rejected: Merge with failing quality job | violates branch quality expectations
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Run 23 files left unchanged before final push when CI enforces repository formatting
Tested: uv run ruff format .; uv run ruff check .; uv run pytest -q; uv run ty check; uv run deptry .
Not-tested: Additional live GitHub webhook/resume interactions beyond existing local verification
Preserve issue discussions by resuming the original OMX session
Review-round completion was previously accepted if any signed PR comment existed,
which allowed stale signatures to satisfy verification and duplicate comments to
queue follow-up work more than once. This change verifies the exact expected
signature and records processed review events so repeated deliveries stay
idempotent.

Constraint: Review completion is inferred from GitHub PR comments rather than a separate durable job callback
Constraint: Must preserve the existing JSON storage model without adding dependencies
Rejected: Trust only the latest signed PR comment | stale or duplicate comments can still satisfy the check incorrectly
Rejected: Deduplicate purely in memory | process restarts would immediately lose the guard
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: If webhook/event delivery semantics change, keep review-round handling idempotent across retries and replays
Tested: ./.venv/bin/pytest -q
Tested: ./.venv/bin/ruff check dani tests
Tested: ./.venv/bin/ruff format --check dani tests
Not-tested: Migration behavior for pre-existing data directories created before processed-events.json existed
The previous fix still performed duplicate detection as a separate read before
recording the processed review event, which left a small race window for two
concurrent deliveries to enqueue the same next round. This change makes the
storage write itself the dedupe gate and also upgrades final verdict
verification to require an exact matching signature instead of any signed PR
comment.

Constraint: Must keep JSON-file persistence and current service architecture
Rejected: Leave review dedupe as check-then-record | concurrent duplicate deliveries can still double-enqueue
Rejected: Keep final verdict on latest signed comment | stale signed review comments can satisfy the wrong stage
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Treat signature-based completion checks consistently across stages; avoid check-then-act dedupe for replayable events
Tested: ./.venv/bin/pytest -q
Tested: ./.venv/bin/ruff check dani tests
Tested: ./.venv/bin/ruff format --check dani tests
Not-tested: Real concurrent webhook delivery against the live GitHub integration
Fix exact review-round verification and dedupe duplicate completions
Session cleanup was implicit and depended on tmux exiting on its own, which left lifecycle completion ambiguous and could strand tmux sessions after successful or failed runs.

This change adds an explicit session finalization step in DaniService, records when and why a session ended, and teaches OmxRunner how to kill an existing tmux session safely. Tests now cover both successful and failed finalization paths as well as the low-level tmux close behavior.

Constraint: Must preserve the existing omx resume flow that uses omx_session_id across fresh tmux shells
Rejected: Full stage lifecycle state machine | too broad for the issue and unnecessary to stop stray tmux sessions
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep tmux cleanup centralized in service finalization so future lifecycle changes do not duplicate termination logic
Tested: python -m pytest -q; python -m ruff check .
Not-tested: Real tmux + omx integration against a live GitHub event stream
Fix tmux session cleanup for OMX job lifecycle
Issue #2 needs stronger prompt contracts so agents explain what they understood, state the expected outcome, and show real execution evidence before review verdicts.

Constraint: Keep the change limited to prompt/template behavior and regression tests
Rejected: Add service-level semantic validation for comment contents | too invasive before locking prompt contract
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If stricter enforcement is needed later, add structured output or service-side validation rather than piling more prose into prompts
Tested: python -m pytest -q tests/test_prompts.py
Not-tested: End-to-end agent comment quality across real repositories
The new Issue #2 prompt requirements were correct but too verbose in prose form, so this follow-up rewrites the issue-request, review, and final-verdict instructions as compact checklists without changing the contract.

Constraint: Preserve the new required sections and evidence expectations while making prompts easier to scan
Rejected: Revert the new requirements entirely | loses the explicit expected-outcome and real-result guidance
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Prefer checklist-style prompt constraints over long prose when tightening stage contracts
Tested: python -m pytest -q tests/test_prompts.py
Not-tested: Full end-to-end agent behavior with live GitHub comments
The agent-facing prompt surface should stay simple and use gh directly, while PyGithub remains an internal dani runtime surface for event handling and repository orchestration.

Constraint: Preserve existing prompt contract requirements while removing the helper-specific instructions from OMX-facing templates
Rejected: Keep the PyGithub helper in prompts as a thin wrapper | still leaks dani internals into OMX sessions
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep GitHub transport details out of OMX prompts unless the session truly needs a dani-internal API
Tested: python -m pytest -q tests/test_prompts.py tests/test_service.py tests/test_github.py
Not-tested: Live gh-authenticated OMX session against a real repository
Implementation prompts should describe the real PR update flow, and review prompts should require -review plus concrete verification evidence without hard-coding product-surface categories.

Constraint: Keep the prompt contracts compact while aligning them with the actual agent workflow
Rejected: Keep web/cli/backend-specific checklist items | too prescriptive and awkward for mixed or non-standard surfaces
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Prefer general evidence language unless a stage is truly tied to a single product surface
Tested: python -m pytest -q tests/test_prompts.py tests/test_service.py tests/test_github.py
Not-tested: Live review-round execution through OMX with the code-review skill
Issue #2: require expected outcome and real-result evidence in prompts
@vkehfdl1 vkehfdl1 linked an issue Mar 22, 2026 that may be closed by this pull request
Dani previously treated a signed review_round PR comment as the trigger for the
next review round, which let the same unchanged PR get reviewed three times in a
row with no repair pass in between. This commit turns review comments into
implementation follow-ups, requires implementation rounds on existing PRs to
publish an exact signed PR comment, and advances back into review/final verdict
from that implementation signature so the PR loop alternates correctly.

Constraint: PR triage must rely on exact Dani signatures rather than free-form PR comment text
Constraint: Existing issue /approve flow and first-time PR creation behavior must keep working
Rejected: Trigger next stages from PR body edits | pull_request.edited is noisier and easier to mis-triage than signed PR comments
Rejected: Keep review_round -> review_round chaining | repeats identical review findings without any fix session
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Keep signed PR comments as the transition event for existing PR implementation rounds unless webhook edited-event handling gains strict diff/dedupe guards
Tested: PYTHONPATH=. ./.venv/bin/pytest -q
Tested: ./.venv/bin/ruff check .
Tested: ./.venv/bin/ty check
Not-tested: Live GitHub webhook delivery against a real repository after the workflow change
The webhook layer now recognizes pushes to main and the sync engine
can merge those commits into dev directly, only escalating to OMX
when a merge conflict requires agent judgment.

Constraint: dev accepts direct pushes from dani automation
Constraint: Conflict resolution must stay available through OMX sessions
Rejected: Open a sync PR for every main update | unnecessary overhead when clean merges can push directly
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Keep push-event normalization aligned with the dev sync workflow so main updates continue to enqueue sync jobs
Tested: make check
Tested: uv run pytest
Not-tested: Live GitHub webhook delivery against a production repository
@vkehfdl1 vkehfdl1 linked an issue Mar 26, 2026 that may be closed by this pull request
GitHub Actions can run the sync flow without a configured global user.name/user.email, so the syncer now supplies its automation identity to both the merge and commit subprocesses.

I kept the fix local to subprocess environment injection and added a regression test that captures the env passed into both commands, rather than mutating repository config during sync.

Constraint: CI runners may not have git user.name/user.email configured
Rejected: Configure git identity in repository config during sync | mutates local repo state unnecessarily
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep automation identity injection on every git command that can materialize merge metadata, not only commit
Tested: make check; make test
Not-tested: Live GitHub Actions run after push
@vkehfdl1 vkehfdl1 merged commit a77eb96 into main Mar 26, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant