Skip to content

feat: Evaluate Prompt Orchestration Conflict Resolution #309

@thinhlpg

Description

@thinhlpg

Description

Evaluate and validate that USER instructions correctly override SERVER instructions when conflicts occur

Context

The prompt orchestration system composes prompts from multiple sources:

  • SERVER-side (developer-controlled): timing, system prompt, deep research, tools, code assistant, chain-of-thought
  • USER-side (user-controlled): project instruction, custom instruction, memory, tone/style

When these conflict (e.g., server says "JSON" but user says "Markdown"), user settings should take priority.

Models to test:

  • Jan 30B Instruct
  • Jan 30B Thinking

Scope

  • Test conflict scenarios: SERVER vs USER instruction priority
  • Test toggle behavior: Enable/disable optional server prompts one-by-one
  • Validate with LLM-as-judge evaluation
  • Document findings and edge cases

Out of Scope

  • Changing the actual prompt orchestration implementation
  • Production deployment

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions