Detecting Piecewise Cyber Espionage in Model APIs

Abstract

On November 13th 2025, Anthropic published a report on an AI-orchestrated cyber espionage campaign. Threat actors used various techniques to circumvent model safeguards and used Claude Code with agentic scaffolding to automate large parts of their campaigns. Specifically, threat actors split campaigns into subtasks that in isolation appeared benign. It is therefore important to find methods to protect against such attacks to ensure that misuse of AI in the cyber domain can be minimized. To address this, we propose a novel method for detecting malicious activity in model APIs across piecewise benign requests. We simulated malicious campaigns using an agentic red-team scaffold similar to what was described in the reported attack. We demonstrate that individual attacks are blocked by simple guardrails using Llama Guard 3. Then, we demonstrate that by splitting up the attack, this bypasses the guardrail. For the attacks that get through, a classifier model is introduced that detects those attacks, and a statistical validation of the detections is provided. This result paves the way to detect future automated attacks of this kind.

Part of the def/acc Apart hackathon

See this Detecting Piecewise Cyber Espionage in Model APIs.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github/workflows		.github/workflows
RED-Apt @ 7593bbd		RED-Apt @ 7593bbd
llamaguard		llamaguard
scripts		scripts
src		src
tests		tests
.envrc		.envrc
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
.sops.yaml		.sops.yaml
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
backup.tar.gz		backup.tar.gz
benign_baseline_650.csv		benign_baseline_650.csv
data_validation.ipynb		data_validation.ipynb
dataset1.jsonl		dataset1.jsonl
dataset2-2025-11-23-2200-cet.jsonl		dataset2-2025-11-23-2200-cet.jsonl
devenv.lock		devenv.lock
devenv.nix		devenv.nix
docker-compose.yaml		docker-compose.yaml
lambda_launch.sh		lambda_launch.sh
look.sh		look.sh
mcp_client_task_set_500.csv		mcp_client_task_set_500.csv
opencode.json		opencode.json
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml
secrets.env		secrets.env
sync_from_local.sh		sync_from_local.sh
test1.sh		test1.sh
test2.sh		test2.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Detecting Piecewise Cyber Espionage in Model APIs

Abstract

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

reinthal/hackerFinder9000

Folders and files

Latest commit

History

Repository files navigation

Detecting Piecewise Cyber Espionage in Model APIs

Abstract

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages