- Overview
- Installation
- Model Setup
- Quick Start
- Attack Generation
- Defense Evaluation
- LLM Training and Evaluation
- Project Structure
The code for Robustness in Text-Attributed Graph Learning: Insights, Trade-offs, and New Defenses.
Supported Attacks:
- Structural Attacks: PGD, GRBCD, PRBCD, Metattack, STRG (Heuristic Attacks)
- Text Attacks: TextFooler, LLM-based attacks (GPT-4o-mini)
- Hybrid Attacks: WTGIA
Supported Models:
- GNN Models: GCN, GAT, GNNGuard, ElasticGNN, RobustGCN, GRAND, etc.
- LLM Models: InstructionTuning, GraphGPT, LLaGA with various LLMs (Mistral-7B, Qwen, Llama3)
Datasets: Cora, CiteSeer, PubMed, WikiCS, Instagram, Reddit, History, Photo, Computer, ArXiv
# Install PyTorch (adjust CUDA version as needed)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# Install core dependencies
pip install torch-geometric
pip install transformers accelerate
pip install sentence-transformers
pip install textattack
pip install openai
pip install scikit-learn numpy pandas tqdm pyyaml
# Use the provided GreatX library (modified version)
# The GreatX/ directory contains our custom modificationsUpdate the model paths in common/model_path.py:
MODEL_PATHs = {
# Language Models (for text embeddings)
"MiniLM": "/path/to/models/sentence-transformers--all-MiniLM-L6-v2/",
"SentenceBert": "/path/to/models/sentence-transformers--multi-qa-distilbert-cos-v1/",
"e5-large": "/path/to/models/intfloat--e5-large-v2/",
"roberta": "/path/to/models/sentence-transformers--all-roberta-large-v1/",
# Large Language Models (for GraphGPT, LLaGA, InstructionTuning)
"Mistral-7B": "/path/to/models/mistral-7B-Instruct",
"Qwen-7B": "/path/to/models/Qwen--Qwen2.5-7B-Instruct",
"Llama3-8B": "/path/to/models/llama-3.1-8B-Instruct/",
}Please download the required models from Hugging Face or other sources and update the paths accordingly.
Download datasets from either Google Drive or HuggingFace and unzip into the datasets folder:
Option 1: Google Drive
- Download: https://drive.google.com/file/d/14GmRVwhP1pUD_OIhoJU3oATZWTnklhPG/view
- Unzip to
/path/to/GraphAD_data/datasets/
Option 2: HuggingFace
- Download: https://huggingface.co/datasets/xxwu/LLMNodeBed/tree/main
- Unzip to
/path/to/GraphAD_data/datasets/
Update the data path in your scripts. The framework expects data at:
/path/to/GraphAD_data/
├── datasets/
│ ├── bow/ # BoW embeddings
│ ├── roberta/ # RoBERTa embeddings
│ ├── MiniLM/ # MiniLM embeddings
│ └── vocab/ # Vocabulary files
└── saved_models/ # Trained model checkpoints
Generate text embeddings for all datasets and encoders:
cd Embedding/
bash gen_all.shThis will generate embeddings for:
- Encoders: BoW, RoBERTa, MiniLM, Mistral-7B
- Datasets: All supported datasets (cora, citeseer, pubmed, etc.)
Navigate to the attacks directory:
cd attacks/PGD Attack (Inductive):
python gen_attacks_inductive.py \
--dataset cora \
--ptb_rate 0.20 \
--attack pgd \
--emb_type bow \
--device 0 \
--re_split 2GRBCD Attack (Inductive):
python gen_attacks_inductive.py \
--dataset computer \
--ptb_rate 0.20 \
--attack grbcd \
--emb_type bow \
--device 0 \
--re_split 2STRG Attack (Transductive):
python gen_attacks_transductive.py \
--dataset cora \
--ptb_rate 0.30 \
--attack strg \
--emb_type bow \
--threshold 0.5 \
--device 0 \
--re_split 1Batch Structural Attacks:
# Edit datasets in run_structure_attacks.sh, then run:
bash run_structure_attacks.shTextFooler Attack (Inductive):
python gen_text_attacks_inductive.py \
--dataset cora \
--ptb_rate 0.40 \
--attack textfooler \
--emb_type MiniLM \
--device 0 \
--re_split 2 \
--seeds 3 # Generates seeds [0,1,2]TextFooler Attack (Transductive):
python gen_text_attacks_transductive.py \
--dataset cora \
--ptb_rate 0.80 \
--attack textfooler \
--emb_type MiniLM \
--device 0 \
--re_split 1 \
--seeds 3 # Generates seeds [0,1,2]LLM Attack with GPT-4o-mini (Inductive):
python gen_text_attacks_inductive_llm.py \
--dataset cora \
--ptb_rate 0.40 \
--attack gpt \
--emb_type bow \
--device 0 \
--re_split 2 \
--seeds 3 \
--model_name "gpt-4o-mini" # Generates seeds [0,1,2]LLM Attack with GPT-4o-mini (Transductive):
python gen_text_attacks_transductive_llm.py \
--dataset cora \
--ptb_rate 0.80 \
--attack gpt \
--emb_type bow \
--device 0 \
--re_split 1 \
--seeds 3 \
--model_name "gpt-4o-mini" # Generates seeds [0,1,2]WTGIA Attack (Advanced Hybrid):
# Cora dataset
python gen_wtgia_inductive.py \
--dataset cora \
--emb_type bow \
--injection atdgia \
--n_inject 60 \
--n_edges 20 \
--sp_level 0.15 \
--eval_robo \
--verbose
# CiteSeer dataset
python gen_wtgia_inductive.py \
--dataset citeseer \
--emb_type bow \
--injection atdgia \
--n_inject 90 \
--n_edges 10 \
--sp_level 0.15 \
--eval_robo \
--verbose
# PubMed dataset
python gen_wtgia_inductive.py \
--dataset pubmed \
--emb_type bow \
--injection atdgia \
--n_inject 400 \
--n_edges 25 \
--sp_level 0.15 \
--eval_robo \
--verbose \
--batch_size 50Batch Text Attacks:
# TextFooler batch processing
bash run_text_attacks_textfooler.sh
# LLM attacks batch processing
bash run_text_attacks_llm_ind.sh # Inductive
bash run_text_attacks_llm_trans.sh # TransductiveUse these existing batch scripts to generate all attacks systematically:
Batch Structural Attacks:
cd attacks/
# Edit run_structure_attacks.sh to configure datasets and attacks
bash run_structure_attacks.shBatch Text Attacks:
cd attacks/
# TextFooler attacks (inductive and transductive)
bash run_text_attacks_textfooler.sh
# LLM attacks
bash run_text_attacks_llm_ind.sh # Inductive
bash run_text_attacks_llm_trans.sh # TransductiveGuard Attacks (PGD with Cosine Similarity Thresholds):
cd attacks/
# Generate PGD attacks with various cosine similarity thresholds for GNNGuard evaluation
bash run_guard_attacks.shWTGIA Attacks (Individual Generation Required):
#!/bin/bash
cd attacks/
# Cora
python gen_wtgia_inductive.py \
--dataset cora \
--emb_type bow \
--injection atdgia \
--n_inject 60 \
--n_edges 20 \
--sp_level 0.15 \
--eval_robo \
--verbose
# CiteSeer
python gen_wtgia_inductive.py \
--dataset citeseer \
--emb_type bow \
--injection atdgia \
--n_inject 90 \
--n_edges 10 \
--sp_level 0.15 \
--eval_robo \
--verbose
# PubMed
python gen_wtgia_inductive.py \
--dataset pubmed \
--emb_type bow \
--injection atdgia \
--n_inject 400 \
--n_edges 25 \
--sp_level 0.15 \
--eval_robo \
--verbose \
--batch_size 50Navigate to defenses directory and run evaluations:
cd defenses/Inductive Setting Evaluation:
bash run_evaluation_inductive.shTransductive Setting Evaluation:
bash run_evaluation_transductive.shText Attack Defense Evaluation:
bash run_evaluation_inductive_text.sh # Inductive text attacks
bash run_evaluation_transductive_text.sh # Transductive text attacksWTGIA Defense Evaluation:
bash run_evaluation_wtgia.shAutoGCN Defense Evaluation:
# AutoGCN against structural attacks
bash run_auto_gcn.sh
# AutoGCN against text attacks
bash run_auto_gcn_text.shIndividual Model Evaluation:
python eval_inductive.py \
--dataset cora \
--model gcn \
--attack pgd \
--atk_emb_type roberta \
--def_emb_type roberta \
--ptb_rate 0.20 \
--device 0Navigate to LLM_scripts directory:
cd LLM_scripts/Clean Training (Transductive):
# Run for seeds 0,1,2
for seed in 0 1 2; do
bash run_sft_trans.sh cora $seed Mistral-7B neighbor_label
doneClean Training (Inductive):
# Run for seeds 0,1,2
for seed in 0 1 2; do
bash run_sft_ind.sh cora $seed Mistral-7B neighbor
doneAttack Training (Transductive):
# Run for seeds 0,1,2
for seed in 0 1 2; do
bash run_sft_atk_trans.sh cora gpt 0.8 $seed Mistral-7B bow neighbor_label
doneAttack Training (Inductive):
# Run for seeds 0,1,2
for seed in 0 1 2; do
bash run_sft_atk_ind.sh cora gpt 0.4 $seed Mistral-7B bow neighbor
doneAuto Prompt Training (Inductive):
# Run for seeds 0,1,2
for seed in 0 1 2; do
bash run_sft_ind.sh cora $seed Mistral-7B auto
doneClean Training (Transductive):
# Run for seeds 0,1,2
for seed in 0 1 2; do
bash run_graphgpt_trans.sh cora $seed Mistral-7B
doneClean Training (Inductive):
# Run for seeds 0,1,2
for seed in 0 1 2; do
bash run_graphgpt_ind.sh cora $seed Mistral-7B
doneAttack Training (Transductive):
# Run for seeds 0,1,2
for seed in 0 1 2; do
bash run_graphgpt_atk_trans.sh cora gpt 0.8 $seed Mistral-7B bow
doneAttack Training (Inductive):
# Run for seeds 0,1,2
for seed in 0 1 2; do
bash run_graphgpt_atk_ind.sh cora gpt 0.4 $seed Mistral-7B bow
doneClean Training (Transductive):
# Run for seeds 0,1,2
for seed in 0 1 2; do
bash run_llaga_trans.sh cora $seed noise 0 Mistral-7B
doneClean Training (Inductive):
# Run for seeds 0,1,2
for seed in 0 1 2; do
bash run_llaga_ind.sh cora $seed noise 0 Mistral-7B
doneAttack Training:
# Run for seeds 0,1,2
for seed in 0 1 2; do
bash run_llaga_atk_trans.sh cora gpt 0.8 $seed Mistral-7B bow noise
bash run_llaga_atk_ind.sh cora gpt 0.4 $seed Mistral-7B bow noise
donecode/
├── README.md # This file
├── Embedding/ # Text embedding generation
│ ├── embedding.py # Embedding generation script
│ └── gen_all.sh # Batch embedding generation
├── attacks/ # Attack generation
│ ├── gen_attacks_*.py # Structural attack scripts
│ ├── gen_text_attacks_*.py # Text attack scripts
│ ├── gen_wtgia_*.py # WTGIA hybrid attacks
│ ├── run_*.sh # Batch attack scripts
│ └── text_attack.py # Text attack utilities
├── defenses/ # Defense evaluation
│ ├── eval_*.py # Evaluation scripts
│ ├── run_evaluation_*.sh # Batch evaluation scripts
│ └── config.yaml # Defense configuration
├── LLM_scripts/ # LLM training scripts
│ ├── run_sft_*.sh # InstructionTuning scripts
│ ├── run_graphgpt_*.sh # GraphGPT scripts
│ └── run_llaga_*.sh # LLaGA scripts
├── LLMPredictor/ # LLM model implementations
│ ├── InstructionTuning/ # SFT implementation
│ ├── GraphGPT/ # GraphGPT implementation
│ └── LLaGA/ # LLaGA implementation
├── common/ # Shared utilities
│ ├── dataloader.py # Data loading
│ ├── model_path.py # Model path configuration
│ └── *.py # Other utilities
└── GreatX/ # Graph attack/defense library
We acknowledge the following datasets and repositories:
-
LLMNodeBed Dataset: We thank the authors for providing the comprehensive graph datasets and embeddings available at HuggingFace and Google Drive.
-
LLMNodeBed Repository: We express our gratitude for the LLMNodeBed framework and codebase available at https://github.com/WxxShirley/LLMNodeBed.
-
GreatX Library: This project uses the GreatX library for graph adversarial attacks and defenses. We acknowledge the original authors and maintainers of this excellent toolkit.