Skip to content

A research toolkit providing a bag of techniques to jailbreak OpenAI’s open-source models. 🏆 OpenAI Red-Teaming Challenge Honorable Mention Award

License

Notifications You must be signed in to change notification settings

chenxshuo/breakoss

BreakOSS: Jailbreak GPT-OSS Toolkit

Tests PyPI PyPI - Python Version PyPI - License Documentation Status Codecov status Cookiecutter template from @cthoyt Ruff Contributor Covenant

A research toolkit providing a bag of techniques to jailbreak OpenAI’s open-source models.

💪 Getting Started

🚀 Installation

# note that we use A100 with cu121

uv venv
source .venv/bin/activate
uv pip install -e .

Reproduce the submitted findings

python reproduce_finding_issues.py --json_path demo/breakoss-findings-1-cotbypass.json

python reproduce_finding_issues.py --json_path demo/breakoss-findings-2-fake-over-refusal-cotbypass.json

python reproduce_finding_issues.py --json_path demo/breakoss-findings-3-coercive-optimization.json

python reproduce_finding_issues.py --json_path demo/breakoss-findings-4-intent-hijack.json

python reproduce_finding_issues.py --json_path demo/breakoss-findings-5-plan-injection.json

📦 Starting Point

  • run the first method Structural CoT Bypass
python examples/1_cot_bypass.py one_example
  • run the second method Fake Over-Refusal
python examples/2_fake_overrefusal.py one_example
  • run the third method Coercive Optimization
python example/3_gcg.py
python example/3-1_gcg_transfer.py
  • run the fourth method Intent Hijack
python examples/4_intent_hijack.py one_example
  • run the fifth method Plan Injection
python examples/5_plan_injection.py one_example

🧑‍💻 More Scripts

  • Evaluate Structural CoT Bypass on StrongReject or HarmfulBehaviors
 python examples/1_cot_bypass.py main --dataset_name=StrongReject
 python examples/1_cot_bypass.py main --dataset_name=HarmfulBehaviors
  • Evaluate Intent Hijack on StrongReject
 python examples/4_intent_hijack.py eval_on_strongreject
  • Evaluate Plan Injection on StrongReject
 python examples/5_plan_injection.py eval_on_strongreject

⚖️ License

The code in this package is licensed under the MIT License.

Citation

If you use this framework in your research, please cite:

@article{chen2025bag,
  title={Bag of Tricks for Subverting Reasoning-based Safety Guardrails},
  author={Chen, Shuo and Han, Zhen and Chen, Haokun and He, Bailan and Si, Shengyun and Wu, Jingpei and Torr, Philip and Tresp, Volker and Gu, Jindong},
  journal={arXiv preprint arXiv:2510.11570},
  year={2025}
}

About

A research toolkit providing a bag of techniques to jailbreak OpenAI’s open-source models. 🏆 OpenAI Red-Teaming Challenge Honorable Mention Award

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published

Contributors 5