Skip to content

Ji-Cather/CiteAgent

Repository files navigation

CiteAgent

Traditional social science research often faces limitations in experimental control and contextual generalizability, with lab studies lacking ecological validity and field studies offering limited manipulation of variables.

To address this, we introduce CiteAgent, an LLM-agent-based platform for simulating citation network dynamics. CiteAgent enables realistic, scalable, and controlled experimentation in academic environments, supporting rigorous hypothesis testing through:

  1. Realistic modeling of citation behaviors;
  2. Precise environmental control for causal analysis;
  3. Scalable, reproducible simulations across diverse research contexts.

CiteAgent is built upon the AgentScope framework. We thank the AgentScope team for providing an excellent, flexible foundation for multi-agent research!

Citational Distortion

Figure 1: CiteAgent Framework Workflow

🛠️ Setup

Before we get started, please configure your OpenAI API keys in the file located at LLMGraph\llms\default_model_configs.json. The format should be as follows:

 {
        "model_type": "openai_chat",
        "config_name": "gpt-3.5-turbo-0125",
        "model_name": "gpt-3.5-turbo-0125",
        "api_key": "sk-.*",
        "generate_args": {
            "max_tokens": 2000,
            "temperature": 0.8
        },
        "client_args":{
            "base_url":""
        }
    }

Next, create the experiment and install the necessary packages by running: pip install -i "requirements.txt"

📦 Usage

We offer three seed networks enriched with text features for author and paper: Cora, Citeseer, and LLM_Agent.

To begin constructing a citation graph, please specify the task_name and config_name:

  • config_name: Control the academic environment setup in CiteAgent"
  • task_name: Choose from "cora", "citeseer", or "llm_agent_*" (where you specify the corresponding seed network).

Then, execute the following commands:

# Build the citation graph using the Cora dataset
python main.py --task cora --config <template_config_name> --build 

# Build the citation graph using the Citeseer dataset
python main.py --task citeseer --config <template_config_name> --build 

# Build the citation graph using the LLM_Agent dataset
python main.py --task llm_agent_1 --config <template_config_name> --build 

Make sure to adjust the task_name according to the seed network you wish to use.

Template Configuration

To customize the simulation, adjust the configuration file found at LLMGraph\tasks\llm_agent_1\configs\template_*.

We offer support for multiple scholarly search engines, including Generated Papers, Arxiv, and Google Scholar. Change the online_retriever_kwargs field to specify the search engine you wish to use.

🧪 Experiments

For the experiments outlined in the paper, we provide a script for execution.

  • Download the Datasets:

    citation

    Format it like:

    tasks/
    ├── citeseer/
    │   ├── data/
    │   ├── configs/
    ├── citeseer_1/
    ├── cora/
    ├── cora_1/
    ├── llm_agent/
    ├── llm_agent_*/
    
  • Run Simulation Experiments:

    Start launchers in one terminal

    python start.py --start_server

    Then run simulation experiments in another terminal

    python start.py 
  • Run Evaluation Metrics for Simulation Experiments:

    python evaluate.py
  • Visualize Experimental Results: Please refer to evaluate/Graph/readme.md for detailed instructions.

✅ Results

The CiteAgent paper simulates key phenomena in citation networks, including power-law distribution and citational distortion. To analyze the mechanisms underlying these observed phenomena, we propose two LLM-based SSR research paradigms for examining human referencing behavior: LLM-SA (Synthetic Analysis) and LLM-CA (Counterfactual Analysis). Additional simulations and analyses of other phenomena are provided in the paper.

Power Law Distribution

The degree distribution of citation networks often follows a power-law distribution[1], reflecting a scale-free characteristic. Citation networks generated by the CiteAgent framework replicate this property, exhibiting realistic scale-free behavior that closely mirrors real-world citation dynamics.

Citational Distortion

Figure 2: Power Law Distribution

Citational Distortion

This phenomenon, which captures biases in citation practices[2], is effectively simulated within the CiteAgent framework. Through interactions among LLM-based agents, CiteAgent reproduces this distortion phenomena.

Citational Distortion

Figure 3: Citational Distortion

References

  1. Barabási A L, Albert R. Emergence of scaling in random networks[J]. science, 1999, 286(5439): 509-512.
  2. Gomez C J, Herman A C, Parigi P. Leading countries in global science increasingly receive more citations than other countries doing similar research[J]. Nature Human Behaviour, 2022, 6(7): 919-929.

Citation

@inproceedings{ji-etal-2025-llm,
    title = "{LLM}-Based Multi-Agent Systems are Scalable Graph Generative Models",
    author = "Ji, Jiarui  and
      Lei, Runlin  and
      Bi, Jialing  and
      Wei, Zhewei  and
      Chen, Xu  and
      Lin, Yankai  and
      Pan, Xuchen  and
      Li, Yaliang  and
      Ding, Bolin",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-acl.78/",
    doi = "10.18653/v1/2025.findings-acl.78",
    pages = "1492--1523",
    ISBN = "979-8-89176-256-5",
}

About

Official Implementation of CiteAgent Framework

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published