Graphia

Graphia is a reinforcement learning-based social network graph generation framework.

Dataset

We have published the processed versions of weibo-tech, weibo-daily, propagate-en(8days_dytag_small_text_en) datasets. The propagate-en data will be made public after the paper is accepted.

Dataset link: https://www.modelscope.cn/datasets/cather111/Graphia_data

Please download and place the dataset in the following directory:

Graphia/data

Checkpoints

Graphia Models trained on Weibo Tech are available at the following links: https://www.modelscope.cn/models/cather111/Graphia-Q https://www.modelscope.cn/models/cather111/Graphia-E

Baselines

To facilitate the reproduction of experimental results, we have uploaded all baseline model codes to: Graphia_baselines

Directory Structure

Graphia/
├── scripts/
│   ├── prepare_dataset.sh      # Graph dataset formatting script
│   ├── train_dp.sh             # Activity predictor training script
│   └── prepare_prompt.sh       # LLM training data formatting script
├── prompt_data/
│   └── weibo_daily/
│       └── train/
│           ├── cold_start/
│           │   └── combined_examples.jsonl  # SFT training data
│           ├── seq/
│           │   ├── seq_edge.jsonl           # Graphia-seq edge rl data
│           │   └── seq_dst.jsonl            # Graphia-seq dst rl data
│           └── teacher_forcing/
│               ├── edge_text_examples.jsonl # Graphia edge rl data
│               └── query_examples.jsonl     # Graphia dst rl data
└── README.md

Module Introduction

ROLL Module

Graphia relies on ROLL for LLM reinforcement learning training. We have made some modifications to the original code to meet specific requirements.

Please place the rlvr component in the following path:

ROLL/roll/pipeline/rlvr

Environment Requirements

Python 3.7+
PyTorch 1.10+
requirements.txt

Quick Start

1. Data Processing Workflow

Complete data preprocessing through the following scripts:

# Format graph dataset
bash scripts/prepare_dataset.sh

# Train activity predictor
bash scripts/train_dp.sh

# Train reward model GNN
bash scripts/train_gnn_tgn.sh

# Format LLM training data
bash scripts/prepare_prompt.sh

Script function descriptions:

prepare_dataset.sh: Prepare and format social network graph datasets
train_dp.sh: Train activity predictor for graph node representation learning
prepare_prompt.sh: Generate prompts for large language model training

2. LLM Training Guide

The following uses the weibo-tech dataset as an example, assuming the above data preparation steps have been completed.

SFT training data location:

Graphia/prompt_data/weibo_daily/train/cold_start/combined_examples.jsonl

Reinforcement Learning Configuration

Training Type	Configuration File Path
DST RL	ROLL/examples/rlvr_megatron_dst/rlvr_config_remote_all_dst_weibo_tech.yaml
Edge RL	ROLL/examples/rlvr_megatron_dst/rlvr_config_remote_all_easy_seq_weibo_tech.yaml

Training Execution

Refer to the following for training commands:

ROLL/examples/rlvr_megatron_dst/local_run.sh

Evaluation Scripts

Graph Generation Post-processing

TDGG processing: Graphia/scripts/postprocess_tdgg.sh
IDGG processing: Graphia/scripts/postprocess_idgg.sh

Report Concatenation and Evaluation

After processing, first concatenate reports from different models, then perform evaluation:

# Concatenate reports
bash Graphia/scripts/concat_reports.sh

# Execute evaluation
TDGG evaluation: Graphia/eval_utils/eval_tdgg.py
IDGG evaluation: Graphia/eval_utils/eval_idgg.py

Contribution Guidelines

Welcome to submit Issues and Pull Requests to help improve the project.

Acknowledgements

Thanks to the following open-source projects and research teams for their support:

ROLL - Reinforcement learning training framework
GDGB - Text dynamic graph benchmark
tigger - Dynamic graph generation model
DGGen - Dynamic graph generation model

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.vscode		.vscode
Graphia		Graphia
ROLL		ROLL
.gitignore		.gitignore
README.md		README.md
README_zh.md		README_zh.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Graphia

Dataset

Checkpoints

Baselines

Directory Structure

Module Introduction

ROLL Module

Environment Requirements

Quick Start

1. Data Processing Workflow

2. LLM Training Guide

Reinforcement Learning Configuration

Training Execution

Evaluation Scripts

Graph Generation Post-processing

Report Concatenation and Evaluation

Contribution Guidelines

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

Ji-Cather/Graphia

Folders and files

Latest commit

History

Repository files navigation

Graphia

Dataset

Checkpoints

Baselines

Directory Structure

Module Introduction

ROLL Module

Environment Requirements

Quick Start

1. Data Processing Workflow

2. LLM Training Guide

Reinforcement Learning Configuration

Training Execution

Evaluation Scripts

Graph Generation Post-processing

Report Concatenation and Evaluation

Contribution Guidelines

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages