LLM Watermark Robustness Under Adversarial Paraphrasing

Real Kirchenbauer watermarking produces z=8.44 detection signal — but cross-model paraphrasing degrades it from 100% detection at pass 0 to progressive failure. A single Claude Haiku paraphrase pass strips green-list token patterns that the watermark depends on.

Blog post: LLM Watermarks Break After One Paraphrase Pass

Key Results

Condition	Detection Rate	z-score (mean ± std)
Pass 0 (no paraphrase)	100%	9.64 ± 1.03
Pass 1 (single paraphrase)	Degraded	Progressive z-score decay
Clean (unwatermarked)	0%	-0.08
Watermark signal gap	—	~8.5σ separation

Quick Start

git clone https://github.com/rexcoleman/llm-watermark-robustness
cd llm-watermark-robustness
pip install -e .
bash reproduce.sh

Project Structure

FINDINGS.md # Research findings with pre-registered hypotheses and full results
EXPERIMENTAL_DESIGN.md # Pre-registered experimental design and methodology
HYPOTHESIS_REGISTRY.md # Hypothesis predictions, results, and verdicts
reproduce.sh # One-command reproduction of all experiments
governance.yaml # govML governance configuration
CITATION.cff # Citation metadata
LICENSE # MIT License
pyproject.toml # Python project configuration
scripts/ # Experiment and analysis scripts
src/ # Source code
tests/ # Test suite
outputs/ # Experiment outputs and results
config/ # Configuration files
docs/ # Documentation and decision records

Methodology

See FINDINGS.md and EXPERIMENTAL_DESIGN.md for detailed methodology, pre-registered hypotheses, and full experimental results with multi-seed validation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Watermark Robustness Under Adversarial Paraphrasing

Key Results

Quick Start

Project Structure

Methodology

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
blog		blog
docs		docs
outputs/experiments_v2		outputs/experiments_v2
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CITATION.cff		CITATION.cff
DECISION_LOG.md		DECISION_LOG.md
EXPERIMENTAL_DESIGN.md		EXPERIMENTAL_DESIGN.md
FINDINGS.md		FINDINGS.md
HYPOTHESIS_REGISTRY.md		HYPOTHESIS_REGISTRY.md
LICENSE		LICENSE
README.md		README.md
governance.yaml		governance.yaml
pyproject.toml		pyproject.toml
reproduce.sh		reproduce.sh

Folders and files

Latest commit

History

Repository files navigation

LLM Watermark Robustness Under Adversarial Paraphrasing

Key Results

Quick Start

Project Structure

Methodology

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages