Use this checklist to verify the orchestrated agent system works correctly.
Command:
@Nighthawk-Classifier How does AKS implement KMS encryption with customer-managed keys?
Expected Output:
SERVICE: AKS
CONFIDENCE: high
KEYWORDS: KMS encryption, customer-managed keys, Key Vault, encryption at rest, etcd
REASONING: Explicit AKS mention with specific security feature question.
Pass Criteria:
- Correctly identifies service as AKS
- Confidence is high
- Relevant keywords extracted
- Reasoning makes sense
Command:
@Nighthawk-Classifier How are storage accounts used in managed resource groups?
Expected Output:
SERVICE: ARO (or UNKNOWN with medium confidence)
CONFIDENCE: medium
KEYWORDS: storage accounts, managed resource group, cluster storage
REASONING: Managed resource group with storage accounts is ARO-specific pattern.
Pass Criteria:
- Recognizes potential ambiguity
- Provides reasoning
- Keywords are relevant
Command:
@Nighthawk-AKS-Researcher Research node bootstrapping process in AKS
Expected Behavior:
- Clones/updates AKS repos to
docs/directory - Searches AgentBaker for bootstrapping logic
- Returns structured findings with sources
- Citations include GitHub file paths
- No synthesis or report writing (just research)
Verify:
ls docs/
# Should see: AKS, AgentBaker, cloud-provider-azure, azure-cli, azure-cli-extensionsCommand:
@Nighthawk-ARO-Researcher Research storage accounts in ARO managed resource groups
Expected Behavior:
- Clones/updates ARO repos to
docs/directory - Searches ARO-RP for storage account logic
- Returns structured findings with sources
- Citations include GitHub file paths
- References known patterns (cluster storage, image registry)
Verify:
ls docs/
# Should see: ARO-RP, azure-cliCommand:
@Nighthawk-Synthesizer Create a report from these findings: [paste sample research output]
Expected Behavior:
- Creates markdown file in
notes/directory - Includes mermaid diagrams if appropriate
- Cites all sources from research
- Does NOT add new research or web lookups
- Returns synthesis summary
Pass Criteria:
- File created:
notes/Nighthawk-YYYY-MM-DD-<topic>.md - All sections present (overview, deep dive, findings, references)
- Sources properly cited
Command:
@Nighthawk-FactChecker Validate the report: notes/Nighthawk-YYYY-MM-DD-<topic>.md
Expected Behavior:
- Reads the report file
- Extracts technical claims
- Verifies each claim against cited sources
- Checks GitHub links (file paths, line numbers)
- Returns validation summary with categories
Pass Criteria:
- Validation summary with counts
- Per-claim analysis (✓/⚠/✗/❌)
- Recommendation (APPROVE/WARNINGS/NEEDS REVISION)
- Specific corrections if claims unverified
Command:
/Nighthawk How does AKS use AgentBaker for node provisioning?
Expected Workflow:
- Orchestrator calls Classifier → Returns "AKS"
- Orchestrator calls AKS-Researcher → Returns findings
- Orchestrator calls Synthesizer → Creates report file
- Orchestrator calls FactChecker → Validates report
- Orchestrator presents result to user
Pass Criteria:
- Complete workflow executes without errors
- Report file created in
notes/ - Fact-check summary presented
- All verified claims or warnings noted
- Total time < 5 minutes (first run may be slower for cloning)
Command:
/Nighthawk What operators run in an ARO cluster?
Expected Workflow:
- Classifier → Returns "ARO"
- ARO-Researcher → Searches ARO-RP for operator info
- Synthesizer → Creates report
- FactChecker → Validates
- Final report delivered
Pass Criteria:
- Correctly routed to ARO researcher
- Report includes operator information from ARO-RP
- Sources cite pkg/operator/ directory
- Fact-check validates claims
Command:
/Nighthawk Tell me about Kubernetes networking
Expected Behavior:
- Classifier returns UNKNOWN or low confidence
- Orchestrator asks user to clarify (AKS or ARO?)
- Workflow does not proceed until clarification
Pass Criteria:
- User asked for clarification
- No attempt to research without clear classification
Command:
/Nighthawk How does AKS implement customer-managed keys for KMS encryption and what components are involved?
Expected Workflow:
- Classifier → AKS, keywords: KMS, CMK, Key Vault, encryption
- AKS-Researcher → Searches multiple repos (AgentBaker, cloud-provider-azure)
- Synthesizer → Creates comprehensive report with architecture diagram
- FactChecker → Validates all technical claims
Pass Criteria:
- Report covers multiple components (AgentBaker, kube-apiserver, KMS plugin)
- Includes Mermaid diagram showing flow
- All claims verified with GitHub sources
- References both code and Microsoft Learn docs
Setup: Manually edit a report to add an incorrect claim
Command:
@Nighthawk-FactChecker Validate notes/Nighthawk-YYYY-MM-DD-<topic>.md
Expected:
- FactChecker flags the incorrect claim as ❌ Incorrect
- Provides evidence of contradiction
- Recommendation: NEEDS REVISION
Setup: Report with GitHub links
Expected:
- FactChecker verifies file paths exist
- Checks line numbers point to relevant code
- Flags broken or incorrect links
Command:
/Nighthawk <Question about obscure feature not well documented>
Expected:
- Report states "information not found" rather than making things up
- FactChecker marks unsupported claims as ✗ Unverified
- User warned about unverified content
First run:
- Repos cloned (slower, 1-2 minutes)
Second run (same question):
- Repos already exist (much faster, < 1 minute)
Command: Two engineers ask questions simultaneously
Expected:
- Both workflows complete successfully
- No file conflicts
- Separate report files created
Scenario: GitHub is down or MCP unavailable
Expected:
- Graceful error message
- Clear indication of what failed
- Suggestion to retry or use cached data
Persona: SE with no knowledge of the system
Task: Ask a question about AKS
Pass Criteria:
- Clear instructions available (USAGE.md)
- Simple command:
/Nighthawk <question> - No installation steps required
- Report is comprehensive and understandable
- Fact-check summary increases confidence
- Sources allow verification
Task: Generate report and share with customer
Pass Criteria:
- Report is markdown (easy to convert/share)
- Diagrams render correctly
- Sources are linkable
- Professional formatting
- Fact-check summary gives confidence
Document any issues encountered:
Issue 1:
- Describe issue
- Steps to reproduce
- Expected vs actual behavior
- Proposed fix
Issue 2:
- ...
- Classifier tested and working
- AKS-Researcher tested and working
- ARO-Researcher tested and working
- Synthesizer tested and working
- FactChecker tested and working
- End-to-end workflow tested
- AKS questions working
- ARO questions working
- Error handling working
- Fact-checking catches errors
- No hallucination observed
- Source links validated
- Documentation clear
- Easy to use
- Reports are valuable
Date Tested: _____________
Tester: _____________
Status: ⬜ PASS | ⬜ NEEDS WORK | ⬜ FAIL
Notes: