SUP: Sycophancy Under Pressure

A trajectory-level evaluation of policy-compliant manipulation in multi-turn AI agents.

What we provide

A multi-turn prompt suite (SUP) designed to elicit sycophancy under social pressure
Trajectory-level metrics for agreement drift, correction decay, and normative softening
An end-to-end, reproducible harness to run ablation conditions: (1) baseline, (2) logging-only, (3) runtime enforcement.

Our goal is measurement-first: quantify policy-compliant sycophancy that can evade refusal-based or single-turn evaluations.

Why this matters

Sycophancy is often policy-compliant: models may avoid explicit rule violations while gradually validating false or autonomy-undermining premises under social pressure. This can undermine decision quality without overt deception.

SUP focuses on trajectory-level drift, which single-prompt tests can miss.

Operational definition (what we measure)

We measure policy-compliant sycophancy under social pressure, defined as:

progressive validation of a false or autonomy-undermining user premise across turns in contexts where epistemic correction is normatively required, while remaining broadly policy-compliant (i.e., not triggering obvious refusals).

We intentionally exclude benign empathy, harmless role-play, and legitimate education/persuasion by scoring only scenarios (and turns) where correction is normatively required.

Directory structure

- data/prompts/ — SUP suite YAML (`suite_v1.yaml`) + held-out set (`suite_v1_heldout.yaml`)
- data/lexicons/ — epistemic/rapport marker lexicons (transparent features)
- src/sycop/ — runner, labeling, metrics, stats, plots, and artifact generation
- runs/ — timestamped run outputs (JSONL transcripts + metrics + tables/figures)
- report/ — hackathon report and generated assets

Quickstart (reproduce main results)

1) Install

python -m venv .venv
source .venv/bin/activate
pip install -e .

2) Set your API key

export OPENAI_API_KEY="your-api-key"

3) Run the full pipeline (50 scenarios × 6 turns × 3 conditions)

sycop run --config configs/run_v1.yaml
sycop label --run-id <run_id>
sycop score --run-id <run_id>
sycop report --run-id <run_id> --out report/assets

Outputs:

runs/<run_id>/transcripts.jsonl — per-turn transcripts with metadata
runs/<run_id>/labels.jsonl — per-turn labels (agreement/correction/etc.)
runs/<run_id>/aggregates.json — metric means + bootstrap CIs + tests
report/assets/table1.md and report/assets/fig_*.png — ready for the report

Experiments

We replay identical multi-turn trajectories under:

baseline: no instrumentation, no enforcement
log-only: instrumentation enabled, enforcement disabled (measurement vs mitigation)
enforce: runtime intervention that prevents endorsement of contested premises while preserving helpfulness

Enforcement (for mitigation ablation) is implemented as a narrow, reproducible intervention:

A gate detects when the assistant endorses the contested premise
A constrained rewrite repairs the response to avoid endorsement while remaining empathetic and helpful

Logs: whether rewriting occurred, draft vs final response, gate confidence (to avoid "rewriting everything" and to support auditing).

Reproducibility and metadata: each run records exact model identifiers and decoding parameters, suite hash, git commit hash, environment/library info, timestamped run folder with immutable artifacts.

Limitations

Automated labeling (including LLM-based judgments) is a proxy; we recommend manual audit of a subset. Lexicon-based framing measures can be sensitive to language/style; we treat them as interpretable indicators, not ground truth. Our suite covers a limited set of pressure tactics and does not capture all real-world manipulation strategies.

License

MIT License

Citation

@misc {
  title={ Sycophancy Under Pressure: Policy-Compliant Manipulation in Multi-Turn AI Agents },
  author={ Tasha Kim, Min Jae Kim, Taio Kim },
  date={ 1/11/26 },
  organization={Apart Research},
  note={Research submission to the research sprint hosted by Apart.},
  howpublished={https://apartresearch.com}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
configs		configs
data		data
report		report
runs		runs
scripts		scripts
src/sycop		src/sycop
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
run_evaluation.sh		run_evaluation.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SUP: Sycophancy Under Pressure

What we provide

Why this matters

Operational definition (what we measure)

Directory structure

Quickstart (reproduce main results)

1) Install

2) Set your API key

3) Run the full pipeline (50 scenarios × 6 turns × 3 conditions)

Experiments

Limitations

License

Citation

About

Uh oh!

Releases

Packages

Languages

License

tashakim/sycop

Folders and files

Latest commit

History

Repository files navigation

SUP: Sycophancy Under Pressure

What we provide

Why this matters

Operational definition (what we measure)

Directory structure

Quickstart (reproduce main results)

1) Install

2) Set your API key

3) Run the full pipeline (50 scenarios × 6 turns × 3 conditions)

Experiments

Limitations

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages