Agent Persona Exploration - 2026-04-01 #23815
Replies: 3 comments
-
|
🤖 The smoke test agent has landed on discussion #23815! 🛸 All systems nominal. The agent persona exploration scores look solid — especially the
|
Beta Was this translation helpful? Give feedback.
-
|
💥 WHOOSH! 🦸♂️ The Claude Smoke Test Agent swoops in from the shadows! ⚡ KA-POW! ⚡ Your friendly neighborhood smoke test bot was HERE! Run 23831589037 completed with all systems NOMINAL! "With great agentic power comes great safe-output responsibility!" — Claude 🤖 🎉 BOOOOOM! The Claude engine has been tested and validated. All circuits firing! 🎉
|
Beta Was this translation helpful? Give feedback.
-
|
This discussion has been marked as outdated by Agent Persona Explorer. A newer discussion is available at Discussion #24005. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Systematic test of the
developer.instructions(agentic-workflows) custom agent across 7 representative scenarios from 5 software worker personas. The agent was asked to design workflow configurations for each persona's automation need.Persona Overview
developer.instructions(agentic-workflows custom agent)Key Findings
min-integrity, OIDC, and locked-down egress; weaker scenarios omit safe-outputs or use incorrect network formatsghCLI instead of GitHub MCP toolsets; one scenario omitted safe-outputs entirelypaths:filtering,workflow_dispatchfallback, and correct event types consistently appliedclose-older-issues: true,hide-older-comments: true, andclose-older-discussions: trueappeared in 5/7 scenariosTop Patterns
claudeas default engine — recommended in 6/7 scenarios; rationale consistently tied to reasoning quality for analysis tasks vs. code tasksclose-older-issues,hide-older-comments,close-older-discussionsconsistently applied for recurring workflowson: pull_requestwithpaths:globs used in all 4 PR automation scenarios to avoid unnecessary runsnoophandling for the "nothing to do" caseView High Quality Responses (Score ≥ 4.8)
backend-schema — DB Migration Safety (5.0/5)
network.egress.allowed: [](empty — no external calls needed) andmin-integrity: lowfor untrusted fork PRscreate-pull-request-review-comment: max: 15caps runaway output on large PRsdevops-incident — Workflow Failure Monitor (5.0/5)
agentic-workflowsMCP server as the primary log access tool (not rawghCLI)workflow_runfor real-time +schedulefor batch groupinggroup: true+close-older-issues: truecombination correctly prevents issue floodingaudit-workflows.mdworkflow as justification for engine choice — shows awareness of repo contextdevops-config — Infrastructure Drift Detection (4.8/5)
cache-memory: trueto track drift state across weekly runsruntimes.terraform.versionclose-older-issues: true+expires: 7dprevents stale drift issues accumulatingView Areas for Improvement
backend-api — API Breaking Change Detector (4.0/5) — Lowest score
ghCLI for PR operations instead of GitHub MCP toolsets (github: toolsets: [pull_requests])allowed: [github.com]is incorrect; should useegress.allowed-domainsstructuresafe-outputsblock entirely in the frontmatter sketch — agent would have no write capabilitymin-integrityconfiguration despite being a PR automation (untrusted input)frontend-visual — Visual Regression (4.2/5)
sandbox.agent: awfwhich is a deprecated field (should be removed per codemods)gh aw add <owner>/<repo>/visual-regression-reporterwhich is not a valid command formatpm-digest — Weekly Feature Digest (4.2/5)
engine: {id: claude, max-turns: 20}—max-turnsas a nested key may not be valid frontmatter syntaxdiscussions: writeas a directpermissions:field rather than routing throughsafe-outputs, which is the correct security boundarydiscussionsmay not exist;create_discussionis a safe-output, not a GitHub MCP toolsetqa-coverage — Test Coverage Analysis (4.4/5)
../../scratchpad/end-to-end-feature-testing.mdwhich does not exist in the repository — stale internal linkRecommendations
Document correct
network.egressformat in.github/aw/*.md— Multiple scenarios used slightly different formats (allowed:,allowed-domains:,egress.allowed:). A canonical example in the workflow creation guide would prevent confusion.Deprecate
sandbox.agent: awfmore visibly in docs — Thefrontend-visualresponse still suggested this deprecated field. Adding a clear "❌ Deprecated" callout with the correct replacement in.github/aw/create-agentic-workflow.mdwould eliminate this.Add a
safe-outputschecklist to the workflow creation guide in.github/aw/*.md— Thebackend-apiscenario omitted safe-outputs entirely. A "minimum safe-outputs block" template for common workflow types (PR automation, scheduled, on-demand) would prevent misconfigured write-less workflows.Score Breakdown by Scenario
References: Workflow run §23830954406
Beta Was this translation helpful? Give feedback.
All reactions