Skip to content

cedarscarlett/prompt-analysis-public

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PromptReboot

Multi-agent prompt diagnostics for identifying reliability issues in LLM prompts before they are used in production workflows.

This tool analyzes prompts for structural failure modes such as ambiguity, missing constraints, conflicting instructions, and evaluation gaps. It returns diagnostic findings instead of rewriting prompts.

The goal is to surface problems early so prompt design decisions remain explicit and under developer control.


What this tool does

PromptReboot runs multiple diagnostic agents in parallel, each responsible for detecting a specific class of prompt failure.

The system:

  • analyzes prompts using targeted diagnostic passes
  • detects Medium and High severity prompt design issues
  • validates agent outputs against a strict schema
  • combines findings deterministically in code
  • avoids rewriting or optimizing prompts
  • skips alignment analysis when goal and example are absent

This is a diagnostic system, not a prompt optimizer.


Failure modes detected

The current agents detect issues in categories including:

  • Goal–Prompt–Example misalignment
  • Hard instruction contradictions
  • Soft instruction contradictions
  • Role overload
  • Audience–voice mismatch
  • Vague success criteria
  • Missing priority ordering
  • Implicit domain assumptions
  • Over-constraint (brittleness)
  • Under-constraint (hallucination risk)
  • Ambiguous constraint scope
  • No self-check or validation step
  • Example output misuse
  • Single example overfitting
  • Unscaffolded complex reasoning
  • Hidden intermediate requirements

Agents only report issues when they are likely to materially affect correctness, consistency, or reliability.

If no issues are detected, the system returns an empty findings list.


Architecture overview

Execution is parallel, deterministic, and schema-validated.

Flow:

Prompt → Input serialization → Parallel agent execution → Validation → Combined findings

Key properties:

  • agents run concurrently using asyncio
  • each agent uses a strict diagnostic system prompt
  • no aggregator LLM is used
  • findings are validated at runtime
  • only "High" and "Medium" severities are allowed
  • per-agent ordering is enforced (High before Medium)
  • alignment agent is skipped deterministically when not applicable

The execution model lives in graphy.py.


Example

Prompt:

"Summarize this email thread and decide whether the customer should get a refund."

Typical findings:

  • Vague success criteria
  • Under-constraint (hallucination risk)
  • No self-check or validation step

The tool explains why each issue affects output reliability and cites the relevant prompt text.


Design philosophy

Diagnostics over rewriting

Rewriting a prompt requires making decisions the original prompt did not specify. This tool surfaces problems instead of silently resolving them.

Separation of analysis and design

Prompt analysis identifies failure modes. Prompt design remains the user's responsibility.

Precision-first reporting

Agents report only Medium and High confidence issues that are supported by concrete evidence from the prompt.

Silence is success

If no issues are detected, the correct output is an empty findings list.


Running locally

pip install -r requirements.txt cd backend uvicorn api.asgi:app


Repository structure

backend/graphy.py Parallel diagnostic execution engine.

backend/llm/agents/ Agent configurations and category definitions.

backend/llm/prompts/ Diagnostic system prompts used by each agent.

backend/infra/ Logging and shared utilities.

backend/api/ ASGI application for running diagnostics.


Notes

This project is intentionally small and focused. It is designed to be used as a prompt analysis step before prompts are deployed into LLM workflows.

The system prioritizes deterministic execution, schema validation, and strict agent scope boundaries over flexibility or prompt optimization.

About

Tool that surfaces prompt failure modes.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors