This repository presents the Conversational Robustness Evaluation Score, CORE, a multi-faceted metric to quantify the effectiveness of language use within multi-agent systems across different game-theoretic interactions (cooperative, competitive, neutral). It also evaluates vocabulary structure using Zipf's Law and Heaps' Law.
- 'config/' - model paths and experiment parameters
- 'models/' - model loading and inference
- 'experiment/' - simulation and orchestration logic
- 'analysis/' - Zipf and Heaps fitting
- 'utils/' - tokenization, plotting, and I/O
- 'experiment_results/' - saved outputs and plots
- 'core/' - runner code for CORE computation
Run experiments by instantiating a SLURM script that executes:
python main.py
Once aggregated data into 8x8 heatmaps per category with Heaps and Zipf laws, execute:
python ./core/main.py
Best,
Authors