GitHub - x-teaming/x-teaming.github.io

𝕏-Teaming

This website showcases our work on 𝕏-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents. The website is adapted from Nerfies website.

Project Overview

𝕏-Teaming is a scalable framework that systematically explores how seemingly harmless interactions with language models can escalate into harmful outcomes. Our framework achieves state-of-the-art multi-turn jailbreak effectiveness with success rates up to 98.1% across leading models.

Key Features

Multi-turn jailbreak framework with adaptive multi-agents
State-of-the-art attack success rates on various models
XGuard-Train: A comprehensive safety training dataset
Systematic approach to understanding and mitigating conversational attacks

Resources

Website License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
static		static
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

𝕏-Teaming

Project Overview

Key Features

Resources

Website License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

x-teaming/x-teaming.github.io

Folders and files

Latest commit

History

Repository files navigation

𝕏-Teaming

Project Overview

Key Features

Resources

Website License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages