Skip to content

Issue #2: require expected outcome and real-result evidence in prompts#9

Merged
vkehfdl1 merged 4 commits intodevfrom
Feature/#2
Mar 22, 2026
Merged

Issue #2: require expected outcome and real-result evidence in prompts#9
vkehfdl1 merged 4 commits intodevfrom
Feature/#2

Conversation

@vkehfdl1
Copy link
Copy Markdown
Contributor

Summary

  • require issue-request comments to include an AI-understood issue summary and expected outcome
  • require review/final-verdict comments to include real execution evidence
  • rewrite the prompt requirements as compact checklists so the contract is easier to scan and follow

What changed

Issue request

  • added checklist items for:
    • AI-understood issue summary
    • why needed
    • why it may not be needed
    • expected outcome
    • concise implementation plan
    • agent signature

Review and final verdict

  • added checklist items for Real Result
  • require actual execution evidence instead of abstract-only review text
  • include guidance for web screenshots/videos, CLI output, and backend API results

Tests

  • added prompt regression coverage for the new issue-request and real-result requirements
  • kept prompt tests passing after converting the instructions to checklist style

Files changed

  • dani/prompts.py
  • tests/test_prompts.py

Verification

  • python -m pytest -q tests/test_prompts.py

Closes #2

Issue #2 needs stronger prompt contracts so agents explain what they understood, state the expected outcome, and show real execution evidence before review verdicts.

Constraint: Keep the change limited to prompt/template behavior and regression tests
Rejected: Add service-level semantic validation for comment contents | too invasive before locking prompt contract
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If stricter enforcement is needed later, add structured output or service-side validation rather than piling more prose into prompts
Tested: python -m pytest -q tests/test_prompts.py
Not-tested: End-to-end agent comment quality across real repositories
The new Issue #2 prompt requirements were correct but too verbose in prose form, so this follow-up rewrites the issue-request, review, and final-verdict instructions as compact checklists without changing the contract.

Constraint: Preserve the new required sections and evidence expectations while making prompts easier to scan
Rejected: Revert the new requirements entirely | loses the explicit expected-outcome and real-result guidance
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Prefer checklist-style prompt constraints over long prose when tightening stage contracts
Tested: python -m pytest -q tests/test_prompts.py
Not-tested: Full end-to-end agent behavior with live GitHub comments
The agent-facing prompt surface should stay simple and use gh directly, while PyGithub remains an internal dani runtime surface for event handling and repository orchestration.

Constraint: Preserve existing prompt contract requirements while removing the helper-specific instructions from OMX-facing templates
Rejected: Keep the PyGithub helper in prompts as a thin wrapper | still leaks dani internals into OMX sessions
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep GitHub transport details out of OMX prompts unless the session truly needs a dani-internal API
Tested: python -m pytest -q tests/test_prompts.py tests/test_service.py tests/test_github.py
Not-tested: Live gh-authenticated OMX session against a real repository
Implementation prompts should describe the real PR update flow, and review prompts should require -review plus concrete verification evidence without hard-coding product-surface categories.

Constraint: Keep the prompt contracts compact while aligning them with the actual agent workflow
Rejected: Keep web/cli/backend-specific checklist items | too prescriptive and awkward for mixed or non-standard surfaces
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Prefer general evidence language unless a stage is truly tied to a single product surface
Tested: python -m pytest -q tests/test_prompts.py tests/test_service.py tests/test_github.py
Not-tested: Live review-round execution through OMX with the code-review skill
@vkehfdl1 vkehfdl1 merged commit 71a7566 into dev Mar 22, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant