Traditional social science research often faces limitations in experimental control and contextual generalizability, with lab studies lacking ecological validity and field studies offering limited manipulation of variables.
To address this, we introduce CiteAgent, an LLM-agent-based platform for simulating citation network dynamics. CiteAgent enables realistic, scalable, and controlled experimentation in academic environments, supporting rigorous hypothesis testing through:
- Realistic modeling of citation behaviors;
- Precise environmental control for causal analysis;
- Scalable, reproducible simulations across diverse research contexts.
CiteAgent is built upon the AgentScope framework. We thank the AgentScope team for providing an excellent, flexible foundation for multi-agent research!
Figure 1: CiteAgent Framework Workflow
Before we get started, please configure your OpenAI API keys in the file located at LLMGraph\llms\default_model_configs.json. The format should be as follows:
{
"model_type": "openai_chat",
"config_name": "gpt-3.5-turbo-0125",
"model_name": "gpt-3.5-turbo-0125",
"api_key": "sk-.*",
"generate_args": {
"max_tokens": 2000,
"temperature": 0.8
},
"client_args":{
"base_url":""
}
}Next, create the experiment and install the necessary packages by running:
pip install -i "requirements.txt"
We offer three seed networks enriched with text features for author and paper: Cora, Citeseer, and LLM_Agent.
To begin constructing a citation graph, please specify the task_name and config_name:
- config_name: Control the academic environment setup in CiteAgent"
- task_name: Choose from "cora", "citeseer", or "llm_agent_*" (where you specify the corresponding seed network).
Then, execute the following commands:
# Build the citation graph using the Cora dataset
python main.py --task cora --config <template_config_name> --build
# Build the citation graph using the Citeseer dataset
python main.py --task citeseer --config <template_config_name> --build
# Build the citation graph using the LLM_Agent dataset
python main.py --task llm_agent_1 --config <template_config_name> --build Make sure to adjust the task_name according to the seed network you wish to use.
To customize the simulation, adjust the configuration file found at LLMGraph\tasks\llm_agent_1\configs\template_*.
We offer support for multiple scholarly search engines, including Generated Papers, Arxiv, and Google Scholar. Change the online_retriever_kwargs field to specify the search engine you wish to use.
For the experiments outlined in the paper, we provide a script for execution.
-
Download the Datasets:
Format it like:
tasks/ ├── citeseer/ │ ├── data/ │ ├── configs/ ├── citeseer_1/ ├── cora/ ├── cora_1/ ├── llm_agent/ ├── llm_agent_*/ -
Run Simulation Experiments:
Start launchers in one terminal
python start.py --start_server
Then run simulation experiments in another terminal
python start.py
-
Run Evaluation Metrics for Simulation Experiments:
python evaluate.py
-
Visualize Experimental Results: Please refer to
evaluate/Graph/readme.mdfor detailed instructions.
The CiteAgent paper simulates key phenomena in citation networks, including power-law distribution and citational distortion. To analyze the mechanisms underlying these observed phenomena, we propose two LLM-based SSR research paradigms for examining human referencing behavior: LLM-SA (Synthetic Analysis) and LLM-CA (Counterfactual Analysis). Additional simulations and analyses of other phenomena are provided in the paper.
The degree distribution of citation networks often follows a power-law distribution[1], reflecting a scale-free characteristic. Citation networks generated by the CiteAgent framework replicate this property, exhibiting realistic scale-free behavior that closely mirrors real-world citation dynamics.
Figure 2: Power Law Distribution
This phenomenon, which captures biases in citation practices[2], is effectively simulated within the CiteAgent framework. Through interactions among LLM-based agents, CiteAgent reproduces this distortion phenomena.
Figure 3: Citational Distortion
- Barabási A L, Albert R. Emergence of scaling in random networks[J]. science, 1999, 286(5439): 509-512.
- Gomez C J, Herman A C, Parigi P. Leading countries in global science increasingly receive more citations than other countries doing similar research[J]. Nature Human Behaviour, 2022, 6(7): 919-929.
@inproceedings{ji-etal-2025-llm,
title = "{LLM}-Based Multi-Agent Systems are Scalable Graph Generative Models",
author = "Ji, Jiarui and
Lei, Runlin and
Bi, Jialing and
Wei, Zhewei and
Chen, Xu and
Lin, Yankai and
Pan, Xuchen and
Li, Yaliang and
Ding, Bolin",
editor = "Che, Wanxiang and
Nabende, Joyce and
Shutova, Ekaterina and
Pilehvar, Mohammad Taher",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-acl.78/",
doi = "10.18653/v1/2025.findings-acl.78",
pages = "1492--1523",
ISBN = "979-8-89176-256-5",
}

