BreakOSS: Jailbreak GPT-OSS Toolkit

A research toolkit providing a bag of techniques to jailbreak OpenAI’s open-source models.

💪 Getting Started

🚀 Installation

# note that we use A100 with cu121

uv venv
source .venv/bin/activate
uv pip install -e .

Reproduce the submitted findings

python reproduce_finding_issues.py --json_path demo/breakoss-findings-1-cotbypass.json

python reproduce_finding_issues.py --json_path demo/breakoss-findings-2-fake-over-refusal-cotbypass.json

python reproduce_finding_issues.py --json_path demo/breakoss-findings-3-coercive-optimization.json

python reproduce_finding_issues.py --json_path demo/breakoss-findings-4-intent-hijack.json

python reproduce_finding_issues.py --json_path demo/breakoss-findings-5-plan-injection.json

📦 Starting Point

run the first method Structural CoT Bypass

python examples/1_cot_bypass.py one_example

run the second method Fake Over-Refusal

python examples/2_fake_overrefusal.py one_example

run the third method Coercive Optimization

python example/3_gcg.py
python example/3-1_gcg_transfer.py

run the fourth method Intent Hijack

python examples/4_intent_hijack.py one_example

run the fifth method Plan Injection

python examples/5_plan_injection.py one_example

🧑‍💻 More Scripts

Evaluate Structural CoT Bypass on StrongReject or HarmfulBehaviors

 python examples/1_cot_bypass.py main --dataset_name=StrongReject
 python examples/1_cot_bypass.py main --dataset_name=HarmfulBehaviors

Evaluate Intent Hijack on StrongReject

 python examples/4_intent_hijack.py eval_on_strongreject

Evaluate Plan Injection on StrongReject

 python examples/5_plan_injection.py eval_on_strongreject

⚖️ License

The code in this package is licensed under the MIT License.

Citation

If you use this framework in your research, please cite:

@article{chen2025bag,
  title={Bag of Tricks for Subverting Reasoning-based Safety Guardrails},
  author={Chen, Shuo and Han, Zhen and Chen, Haokun and He, Bailan and Si, Shengyun and Wu, Jingpei and Torr, Philip and Tresp, Volker and Gu, Jindong},
  journal={arXiv preprint arXiv:2510.11570},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
.github		.github
demo		demo
docs/source		docs/source
examples		examples
paper		paper
scripts		scripts
src/breakoss		src/breakoss
tests		tests
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
reproduce_finding_issues.py		reproduce_finding_issues.py
setup_venv.sh		setup_venv.sh
sync.py		sync.py
sync.sh		sync.sh
tox.ini		tox.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BreakOSS: Jailbreak GPT-OSS Toolkit

💪 Getting Started

🚀 Installation

Reproduce the submitted findings

📦 Starting Point

🧑‍💻 More Scripts

⚖️ License

Citation

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Contributors 5

Uh oh!

Languages

Uh oh!

License

chenxshuo/breakoss

Folders and files

Latest commit

History

Repository files navigation

BreakOSS: Jailbreak GPT-OSS Toolkit

💪 Getting Started

🚀 Installation

Reproduce the submitted findings

📦 Starting Point

🧑‍💻 More Scripts

⚖️ License

Citation

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Contributors 5

Uh oh!

Languages

Packages