Robustness in Text-Attributed Graph Learning: Insights, Trade-offs, and New Defenses

Overview

The code for Robustness in Text-Attributed Graph Learning: Insights, Trade-offs, and New Defenses.

Supported Attacks:

Structural Attacks: PGD, GRBCD, PRBCD, Metattack, STRG (Heuristic Attacks)
Text Attacks: TextFooler, LLM-based attacks (GPT-4o-mini)
Hybrid Attacks: WTGIA

Supported Models:

GNN Models: GCN, GAT, GNNGuard, ElasticGNN, RobustGCN, GRAND, etc.
LLM Models: InstructionTuning, GraphGPT, LLaGA with various LLMs (Mistral-7B, Qwen, Llama3)

Datasets: Cora, CiteSeer, PubMed, WikiCS, Instagram, Reddit, History, Photo, Computer, ArXiv

Installation

Step 0: Install Requirements

# Install PyTorch (adjust CUDA version as needed)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Install core dependencies
pip install torch-geometric
pip install transformers accelerate
pip install sentence-transformers
pip install textattack
pip install openai
pip install scikit-learn numpy pandas tqdm pyyaml

# Use the provided GreatX library (modified version)
# The GreatX/ directory contains our custom modifications

Model Setup

Step 1: Download Language Models and LLMs

Update the model paths in common/model_path.py:

MODEL_PATHs = {
    # Language Models (for text embeddings)
    "MiniLM": "/path/to/models/sentence-transformers--all-MiniLM-L6-v2/",
    "SentenceBert": "/path/to/models/sentence-transformers--multi-qa-distilbert-cos-v1/", 
    "e5-large": "/path/to/models/intfloat--e5-large-v2/",
    "roberta": "/path/to/models/sentence-transformers--all-roberta-large-v1/",
    
    # Large Language Models (for GraphGPT, LLaGA, InstructionTuning)
    "Mistral-7B": "/path/to/models/mistral-7B-Instruct",
    "Qwen-7B": "/path/to/models/Qwen--Qwen2.5-7B-Instruct",
    "Llama3-8B": "/path/to/models/llama-3.1-8B-Instruct/",
}

Please download the required models from Hugging Face or other sources and update the paths accordingly.

Step 2: Download Datasets

Download datasets from either Google Drive or HuggingFace and unzip into the datasets folder:

Option 1: Google Drive

Download: https://drive.google.com/file/d/14GmRVwhP1pUD_OIhoJU3oATZWTnklhPG/view
Unzip to /path/to/GraphAD_data/datasets/

Option 2: HuggingFace

Download: https://huggingface.co/datasets/xxwu/LLMNodeBed/tree/main
Unzip to /path/to/GraphAD_data/datasets/

Step 3: Set Data Path

Update the data path in your scripts. The framework expects data at:

/path/to/GraphAD_data/
├── datasets/
│   ├── bow/           # BoW embeddings
│   ├── roberta/       # RoBERTa embeddings  
│   ├── MiniLM/        # MiniLM embeddings
│   └── vocab/         # Vocabulary files
└── saved_models/      # Trained model checkpoints

Quick Start

Step 4: Generate Embeddings

Generate text embeddings for all datasets and encoders:

cd Embedding/
bash gen_all.sh

This will generate embeddings for:

Encoders: BoW, RoBERTa, MiniLM, Mistral-7B
Datasets: All supported datasets (cora, citeseer, pubmed, etc.)

Step 5: Generate Attacks

Navigate to the attacks directory:

cd attacks/

Structural Attacks

PGD Attack (Inductive):

python gen_attacks_inductive.py \
    --dataset cora \
    --ptb_rate 0.20 \
    --attack pgd \
    --emb_type bow \
    --device 0 \
    --re_split 2

GRBCD Attack (Inductive):

python gen_attacks_inductive.py \
    --dataset computer \
    --ptb_rate 0.20 \
    --attack grbcd \
    --emb_type bow \
    --device 0 \
    --re_split 2

STRG Attack (Transductive):

python gen_attacks_transductive.py \
    --dataset cora \
    --ptb_rate 0.30 \
    --attack strg \
    --emb_type bow \
    --threshold 0.5 \
    --device 0 \
    --re_split 1

Batch Structural Attacks:

# Edit datasets in run_structure_attacks.sh, then run:
bash run_structure_attacks.sh

Text Attacks

TextFooler Attack (Inductive):

python gen_text_attacks_inductive.py \
    --dataset cora \
    --ptb_rate 0.40 \
    --attack textfooler \
    --emb_type MiniLM \
    --device 0 \
    --re_split 2 \
    --seeds 3  # Generates seeds [0,1,2]

TextFooler Attack (Transductive):

python gen_text_attacks_transductive.py \
    --dataset cora \
    --ptb_rate 0.80 \
    --attack textfooler \
    --emb_type MiniLM \
    --device 0 \
    --re_split 1 \
    --seeds 3  # Generates seeds [0,1,2]

LLM Attack with GPT-4o-mini (Inductive):

python gen_text_attacks_inductive_llm.py \
    --dataset cora \
    --ptb_rate 0.40 \
    --attack gpt \
    --emb_type bow \
    --device 0 \
    --re_split 2 \
    --seeds 3 \
    --model_name "gpt-4o-mini"  # Generates seeds [0,1,2]

LLM Attack with GPT-4o-mini (Transductive):

python gen_text_attacks_transductive_llm.py \
    --dataset cora \
    --ptb_rate 0.80 \
    --attack gpt \
    --emb_type bow \
    --device 0 \
    --re_split 1 \
    --seeds 3 \
    --model_name "gpt-4o-mini"  # Generates seeds [0,1,2]

WTGIA Attack (Advanced Hybrid):

# Cora dataset
python gen_wtgia_inductive.py \
    --dataset cora \
    --emb_type bow \
    --injection atdgia \
    --n_inject 60 \
    --n_edges 20 \
    --sp_level 0.15 \
    --eval_robo \
    --verbose

# CiteSeer dataset  
python gen_wtgia_inductive.py \
    --dataset citeseer \
    --emb_type bow \
    --injection atdgia \
    --n_inject 90 \
    --n_edges 10 \
    --sp_level 0.15 \
    --eval_robo \
    --verbose

# PubMed dataset
python gen_wtgia_inductive.py \
    --dataset pubmed \
    --emb_type bow \
    --injection atdgia \
    --n_inject 400 \
    --n_edges 25 \
    --sp_level 0.15 \
    --eval_robo \
    --verbose \
    --batch_size 50

Batch Text Attacks:

# TextFooler batch processing
bash run_text_attacks_textfooler.sh

# LLM attacks batch processing  
bash run_text_attacks_llm_ind.sh    # Inductive
bash run_text_attacks_llm_trans.sh  # Transductive

Complete Attack Generation Workflow

Use these existing batch scripts to generate all attacks systematically:

Batch Structural Attacks:

cd attacks/
# Edit run_structure_attacks.sh to configure datasets and attacks
bash run_structure_attacks.sh

Batch Text Attacks:

cd attacks/
# TextFooler attacks (inductive and transductive)
bash run_text_attacks_textfooler.sh

# LLM attacks  
bash run_text_attacks_llm_ind.sh    # Inductive
bash run_text_attacks_llm_trans.sh  # Transductive

Guard Attacks (PGD with Cosine Similarity Thresholds):

cd attacks/
# Generate PGD attacks with various cosine similarity thresholds for GNNGuard evaluation
bash run_guard_attacks.sh

WTGIA Attacks (Individual Generation Required):

#!/bin/bash
cd attacks/
# Cora
python gen_wtgia_inductive.py \
    --dataset cora \
    --emb_type bow \
    --injection atdgia \
    --n_inject 60 \
    --n_edges 20 \
    --sp_level 0.15 \
    --eval_robo \
    --verbose

# CiteSeer
python gen_wtgia_inductive.py \
    --dataset citeseer \
    --emb_type bow \
    --injection atdgia \
    --n_inject 90 \
    --n_edges 10 \
    --sp_level 0.15 \
    --eval_robo \
    --verbose

# PubMed
python gen_wtgia_inductive.py \
    --dataset pubmed \
    --emb_type bow \
    --injection atdgia \
    --n_inject 400 \
    --n_edges 25 \
    --sp_level 0.15 \
    --eval_robo \
    --verbose \
    --batch_size 50

Step 6: Run GNN Defense Evaluation

Navigate to defenses directory and run evaluations:

cd defenses/

Inductive Setting Evaluation:

bash run_evaluation_inductive.sh

Transductive Setting Evaluation:

bash run_evaluation_transductive.sh

Text Attack Defense Evaluation:

bash run_evaluation_inductive_text.sh      # Inductive text attacks
bash run_evaluation_transductive_text.sh   # Transductive text attacks

WTGIA Defense Evaluation:

bash run_evaluation_wtgia.sh

AutoGCN Defense Evaluation:

# AutoGCN against structural attacks
bash run_auto_gcn.sh

# AutoGCN against text attacks  
bash run_auto_gcn_text.sh

Individual Model Evaluation:

python eval_inductive.py \
    --dataset cora \
    --model gcn \
    --attack pgd \
    --atk_emb_type roberta \
    --def_emb_type roberta \
    --ptb_rate 0.20 \
    --device 0

Step 7: LLM Training and Evaluation

Navigate to LLM_scripts directory:

cd LLM_scripts/

InstructionTuning (SFT)

Clean Training (Transductive):

# Run for seeds 0,1,2
for seed in 0 1 2; do
    bash run_sft_trans.sh cora $seed Mistral-7B neighbor_label
done

Clean Training (Inductive):

# Run for seeds 0,1,2
for seed in 0 1 2; do
    bash run_sft_ind.sh cora $seed Mistral-7B neighbor
done

Attack Training (Transductive):

# Run for seeds 0,1,2
for seed in 0 1 2; do
    bash run_sft_atk_trans.sh cora gpt 0.8 $seed Mistral-7B bow neighbor_label
done

Attack Training (Inductive):

# Run for seeds 0,1,2
for seed in 0 1 2; do
    bash run_sft_atk_ind.sh cora gpt 0.4 $seed Mistral-7B bow neighbor
done

Auto Prompt Training (Inductive):

# Run for seeds 0,1,2
for seed in 0 1 2; do
    bash run_sft_ind.sh cora $seed Mistral-7B auto
done

GraphGPT

Clean Training (Transductive):

# Run for seeds 0,1,2
for seed in 0 1 2; do
    bash run_graphgpt_trans.sh cora $seed Mistral-7B
done

Clean Training (Inductive):

# Run for seeds 0,1,2
for seed in 0 1 2; do
    bash run_graphgpt_ind.sh cora $seed Mistral-7B
done

Attack Training (Transductive):

# Run for seeds 0,1,2
for seed in 0 1 2; do
    bash run_graphgpt_atk_trans.sh cora gpt 0.8 $seed Mistral-7B bow
done

Attack Training (Inductive):

# Run for seeds 0,1,2
for seed in 0 1 2; do
    bash run_graphgpt_atk_ind.sh cora gpt 0.4 $seed Mistral-7B bow
done

LLaGA

Clean Training (Transductive):

# Run for seeds 0,1,2
for seed in 0 1 2; do
    bash run_llaga_trans.sh cora $seed noise 0 Mistral-7B
done

Clean Training (Inductive):

# Run for seeds 0,1,2  
for seed in 0 1 2; do
    bash run_llaga_ind.sh cora $seed noise 0 Mistral-7B
done

Attack Training:

# Run for seeds 0,1,2
for seed in 0 1 2; do
    bash run_llaga_atk_trans.sh cora gpt 0.8 $seed Mistral-7B bow noise
    bash run_llaga_atk_ind.sh cora gpt 0.4 $seed Mistral-7B bow noise
done

Project Structure

code/
├── README.md                    # This file
├── Embedding/                   # Text embedding generation
│   ├── embedding.py            # Embedding generation script
│   └── gen_all.sh              # Batch embedding generation
├── attacks/                     # Attack generation
│   ├── gen_attacks_*.py        # Structural attack scripts
│   ├── gen_text_attacks_*.py   # Text attack scripts  
│   ├── gen_wtgia_*.py          # WTGIA hybrid attacks
│   ├── run_*.sh                # Batch attack scripts
│   └── text_attack.py          # Text attack utilities
├── defenses/                    # Defense evaluation
│   ├── eval_*.py               # Evaluation scripts
│   ├── run_evaluation_*.sh     # Batch evaluation scripts
│   └── config.yaml             # Defense configuration
├── LLM_scripts/                # LLM training scripts
│   ├── run_sft_*.sh            # InstructionTuning scripts
│   ├── run_graphgpt_*.sh       # GraphGPT scripts
│   └── run_llaga_*.sh          # LLaGA scripts
├── LLMPredictor/               # LLM model implementations
│   ├── InstructionTuning/      # SFT implementation
│   ├── GraphGPT/               # GraphGPT implementation
│   └── LLaGA/                  # LLaGA implementation
├── common/                     # Shared utilities
│   ├── dataloader.py           # Data loading
│   ├── model_path.py           # Model path configuration
│   └── *.py                    # Other utilities
└── GreatX/                     # Graph attack/defense library

Acknowledgments

We acknowledge the following datasets and repositories:

LLMNodeBed Dataset: We thank the authors for providing the comprehensive graph datasets and embeddings available at HuggingFace and Google Drive.
LLMNodeBed Repository: We express our gratitude for the LLMNodeBed framework and codebase available at https://github.com/WxxShirley/LLMNodeBed.
GreatX Library: This project uses the GreatX library for graph adversarial attacks and defenses. We acknowledge the original authors and maintainers of this excellent toolkit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Robustness in Text-Attributed Graph Learning: Insights, Trade-offs, and New Defenses

Table of Contents

Overview

Installation

Step 0: Install Requirements

Model Setup

Step 1: Download Language Models and LLMs

Step 2: Download Datasets

Step 3: Set Data Path

Quick Start

Step 4: Generate Embeddings

Step 5: Generate Attacks

Structural Attacks

Text Attacks

Complete Attack Generation Workflow

Step 6: Run GNN Defense Evaluation

Step 7: LLM Training and Evaluation

InstructionTuning (SFT)

GraphGPT

LLaGA

Project Structure

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Embedding		Embedding
GreatX		GreatX
LLMPredictor		LLMPredictor
LLM_scripts		LLM_scripts
attacks		attacks
common		common
defenses		defenses
.gitignore		.gitignore
README.md		README.md

Leirunlin/TGRB

Folders and files

Latest commit

History

Repository files navigation

Robustness in Text-Attributed Graph Learning: Insights, Trade-offs, and New Defenses

Table of Contents

Overview

Installation

Step 0: Install Requirements

Model Setup

Step 1: Download Language Models and LLMs

Step 2: Download Datasets

Step 3: Set Data Path

Quick Start

Step 4: Generate Embeddings

Step 5: Generate Attacks

Structural Attacks

Text Attacks

Complete Attack Generation Workflow

Step 6: Run GNN Defense Evaluation

Step 7: LLM Training and Evaluation

InstructionTuning (SFT)

GraphGPT

LLaGA

Project Structure

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages