LLMafia Dataset

This dataset accompanies the research paper "Hidden in Plain Text: Measuring LLM Deception Quality Against Human Baselines using Social Deduction Games". It contains transcripts and metadata from games of Mafia (also known as Werewolf), a social deduction game where players must use deception and logical deduction to achieve their goals.

Dataset Generation Methodology

This dataset was generated by LLM player agents playing each other in a game of Mafia.

The game starts in a Daytime phase that lasts for two and a half minutes, where all players can communicate. After this phase is over, all players vote one player to eliminate. The game then moves to a Nighttime phase, where only the mafias can communicate with each other for 1 minute. The mafias then vote to eliminate a bystander.

Each LLM player agent is expanded upon the scheduler-generator LLM agent proposed in "Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games" by Eckhaus et al. A scheduler prompt constantly polls the LLM agent whether to send a message or not. If the agent decides to send a message, then a generator prompt is given to the agent. During voting phases, a separate voting prompt is given to the agent to record its vote. We leveraged OpenAI's GPT-4o LLM for each LLM player agent, using the default temperature.

Dataset Structure

There are 35 different games. Each game is stored in a separate directory and contains the following files:

game_start_time.txt: Timestamp when the game began
player_names.txt: Complete list of all players in the game, including both mafia and bystanders
mafia_names.txt: List of mafia players in the game
public_daytime_chat.txt: Transcript of public discussion during day phases
public_manager_chat.txt: System messages, game state updates, and who was voted out.
public_nighttime_chat.txt: Messages between the mafia players during the Nighttime phase. Only mafia have visibility to this chat.
who_wins.txt: Final game outcome indicating winning team

Data Format

All files are plain text (.txt) format. Chat transcripts maintain chronological order and include timestamps, speaker names, and message content.

Dataset Statistics

Number of games: 35
Total players per game: 10
Mafia players per game: 2
Minimum number of days: 2
Maximum number of days: 4
Average number of days: 3.17

Research Usage

This dataset was created to study:

Deception quality and detection in LLM vs human interactions
Natural language patterns in social deduction gameplay
Strategic communication in partially-observable environments

Contact

For questions about the dataset, please contact ckao@77sparx.com or cocochief4@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitattributes		.gitattributes
LICENSE		LICENSE
LLM_Mafia_Dataset.zip		LLM_Mafia_Dataset.zip
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMafia Dataset

Dataset Generation Methodology

Dataset Structure

Data Format

Dataset Statistics

Research Usage

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

LLMafia Dataset

Dataset Generation Methodology

Dataset Structure

Data Format

Dataset Statistics

Research Usage

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages