CORE: Measuring Multi-Agent LLM Interaction Quality under Game-Theoretic Pressures

This repository presents the Conversational Robustness Evaluation Score, CORE, a multi-faceted metric to quantify the effectiveness of language use within multi-agent systems across different game-theoretic interactions (cooperative, competitive, neutral). It also evaluates vocabulary structure using Zipf's Law and Heaps' Law.

Structure

'config/' - model paths and experiment parameters
'models/' - model loading and inference
'experiment/' - simulation and orchestration logic
'analysis/' - Zipf and Heaps fitting
'utils/' - tokenization, plotting, and I/O
'experiment_results/' - saved outputs and plots
'core/' - runner code for CORE computation

Usage

Run experiments by instantiating a SLURM script that executes:

python main.py

Once aggregated data into 8x8 heatmaps per category with Heaps and Zipf laws, execute:

python ./core/main.py

Best,

Authors

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
analysis		analysis
configs		configs
core		core
experiment		experiment
models		models
utils		utils
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CORE: Measuring Multi-Agent LLM Interaction Quality under Game-Theoretic Pressures

Structure

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

psyonp/core

Folders and files

Latest commit

History

Repository files navigation

CORE: Measuring Multi-Agent LLM Interaction Quality under Game-Theoretic Pressures

Structure

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages