Skip to content

x-teaming/x-teaming.github.io

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

23 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

𝕏-Teaming

This website showcases our work on 𝕏-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents. The website is adapted from Nerfies website.

Project Overview

𝕏-Teaming is a scalable framework that systematically explores how seemingly harmless interactions with language models can escalate into harmful outcomes. Our framework achieves state-of-the-art multi-turn jailbreak effectiveness with success rates up to 98.1% across leading models.

Key Features

  • Multi-turn jailbreak framework with adaptive multi-agents
  • State-of-the-art attack success rates on various models
  • XGuard-Train: A comprehensive safety training dataset
  • Systematic approach to understanding and mitigating conversational attacks

Resources

Website License

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •