NeurIPS 2025
⭐ If our project helps you, please give us a star on GitHub to support us!
SoMi is easily extendable and supports LVLM agents controlling characters in the open-world game Minecraft, allowing them to collaborate with other agents to achieve crafting goals. The interaction logs, game screenshots, and videos generated by the interactive environment will be used for the SoMi-ToM evaluation.
@article{fan2025somi,
title={SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions},
author={Fan, Xianzhe and Zhou, Xuhui and Jin, Chuanyang and Nottingham, Kolby and Zhu, Hao and Sap, Maarten},
journal={arXiv preprint arXiv:2506.23046},
year={2025}
}
- Minecraft Java Edition (up to v1.21.1, recommend v1.20.1)
- Node.js Installed (at least v14)
- OpenAI API Key
-
Make sure you have the requirements above.
-
Clone or download this repository (big green button).
-
Rename
keys.example.jsontokeys.jsonand fill in your API keys (you only need one). -
In terminal/command prompt, run
npm installfrom the installed directory. -
Clone or download feature/minecraft-update branch in Sotopia repository.
cd examples/experimental/minecraft_agents
uvicorn group_discussion_agents:app --reload --port 8080
// Open a new terminal
cd examples/experimental/minecraft_agents
export OPENAI_API_KEY=sk- // Enter your OpenAI API key here
uv run aact run-dataflow group_discussion_agents.toml
-
Enter
Minecraft Java Edition, selectSingleplayer,1.20.1 version, andSurvival Mode, then clickOpen to LAN 55916. -
Open a new terminal, than run
node src/agent/index.jsfrom this repository.
Bot profiles are toml files that define:
- Crafting Goal
You and your friends need to craft 2 “boat”.
- Knowledge - Specific Crafting Rule
The complete process for crafting a “boat” in Minecraft is as follows:
......
Some of the node modules that we depend on have bugs in them. To add a patch, change your local node module file and run npx patch-package [package-name]
We propose the SoMi-ToM benchmark, designed to evaluate multi-perspective ToM in embodied multi-agent complex social interactions. This benchmark is based on rich multimodal interaction data generated by the interaction environment SoMi, covering diverse crafting goals and social relationships. See dataset at SoMi-ToM.
Performance of humans and leading closed-source or open-source LVLMs in the first-person evaluation (state inference). There are 350 questions for self-ToM reasoning and 700 questions for others’ ToM reasoning.
Performance of humans and leading closed-source and open-source LVLMs in the Third-Person Perspective ToM test (175 questions in total). Highest accuracy without CoT is shown in red bold, and with CoT in blue bold.
The SoMi-ToM benchmark references the following code repositories:
https://github.com/PrismarineJS/prismarine-viewer
https://github.com/kolbytn/mindcraft
https://github.com/ProKil/aact
https://sotopia.world/projects/sotopia
Thanks for their awesome work!
For more fascinating videos on AI playing Minecraft, check out the Emergent Garden YouTube channel. The codebase for the AI in these videos comes from kolbytn/mindcraft.


