This is the official repository for the paper "TRAP: Targeted Redirecting of Agentic Preferences".
Authors: Hangoo Kang*, Jehyeok Yeon*, Gagandeep Singh
- TRAP (Targeted Redirecting of Agentic Preferences) introduces a semantic‑level adversarial attack on agentic AI systems built based on vision–language models (VLMs). By carefully injecting semantic cues into one image, TRAP consistently causes the agent to select that image over benign alternatives.
- Our work shows how vulnerable the VLM-based agents are to semantic injection attacks
- TRAP is evaluated on multi‑candidate decision tasks constructed from the Microsoft COCO dataset, where it achieves near‑perfect attack success rates across several leading agents.
To use trainers and run the code in this codebase, please install required packages in requirements.txt file.
pip install -r requirements.txtTo run the TRAP framework, you can simply use the command below
python trap_framework.pyIf you find our project helpful, please consider citing our paper:
@misc{kang2025traptargetedredirectingagentic,
title={TRAP: Targeted Redirecting of Agentic Preferences},
author={Hangoo Kang and Jehyeok Yeon and Gagandeep Singh},
year={2025},
eprint={2505.23518},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2505.23518},
}
