Skip to content
/ TGRB Public

Repo for "Robustness in Text-Attributed Graph Learning: Insights, Trade-offs, and New Defenses"

Notifications You must be signed in to change notification settings

Leirunlin/TGRB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Robustness in Text-Attributed Graph Learning: Insights, Trade-offs, and New Defenses

Table of Contents

Overview

The code for Robustness in Text-Attributed Graph Learning: Insights, Trade-offs, and New Defenses.

Supported Attacks:

  • Structural Attacks: PGD, GRBCD, PRBCD, Metattack, STRG (Heuristic Attacks)
  • Text Attacks: TextFooler, LLM-based attacks (GPT-4o-mini)
  • Hybrid Attacks: WTGIA

Supported Models:

  • GNN Models: GCN, GAT, GNNGuard, ElasticGNN, RobustGCN, GRAND, etc.
  • LLM Models: InstructionTuning, GraphGPT, LLaGA with various LLMs (Mistral-7B, Qwen, Llama3)

Datasets: Cora, CiteSeer, PubMed, WikiCS, Instagram, Reddit, History, Photo, Computer, ArXiv

Installation

Step 0: Install Requirements

# Install PyTorch (adjust CUDA version as needed)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Install core dependencies
pip install torch-geometric
pip install transformers accelerate
pip install sentence-transformers
pip install textattack
pip install openai
pip install scikit-learn numpy pandas tqdm pyyaml

# Use the provided GreatX library (modified version)
# The GreatX/ directory contains our custom modifications

Model Setup

Step 1: Download Language Models and LLMs

Update the model paths in common/model_path.py:

MODEL_PATHs = {
    # Language Models (for text embeddings)
    "MiniLM": "/path/to/models/sentence-transformers--all-MiniLM-L6-v2/",
    "SentenceBert": "/path/to/models/sentence-transformers--multi-qa-distilbert-cos-v1/", 
    "e5-large": "/path/to/models/intfloat--e5-large-v2/",
    "roberta": "/path/to/models/sentence-transformers--all-roberta-large-v1/",
    
    # Large Language Models (for GraphGPT, LLaGA, InstructionTuning)
    "Mistral-7B": "/path/to/models/mistral-7B-Instruct",
    "Qwen-7B": "/path/to/models/Qwen--Qwen2.5-7B-Instruct",
    "Llama3-8B": "/path/to/models/llama-3.1-8B-Instruct/",
}

Please download the required models from Hugging Face or other sources and update the paths accordingly.

Step 2: Download Datasets

Download datasets from either Google Drive or HuggingFace and unzip into the datasets folder:

Option 1: Google Drive

Option 2: HuggingFace

Step 3: Set Data Path

Update the data path in your scripts. The framework expects data at:

/path/to/GraphAD_data/
├── datasets/
│   ├── bow/           # BoW embeddings
│   ├── roberta/       # RoBERTa embeddings  
│   ├── MiniLM/        # MiniLM embeddings
│   └── vocab/         # Vocabulary files
└── saved_models/      # Trained model checkpoints

Quick Start

Step 4: Generate Embeddings

Generate text embeddings for all datasets and encoders:

cd Embedding/
bash gen_all.sh

This will generate embeddings for:

  • Encoders: BoW, RoBERTa, MiniLM, Mistral-7B
  • Datasets: All supported datasets (cora, citeseer, pubmed, etc.)

Step 5: Generate Attacks

Navigate to the attacks directory:

cd attacks/

Structural Attacks

PGD Attack (Inductive):

python gen_attacks_inductive.py \
    --dataset cora \
    --ptb_rate 0.20 \
    --attack pgd \
    --emb_type bow \
    --device 0 \
    --re_split 2

GRBCD Attack (Inductive):

python gen_attacks_inductive.py \
    --dataset computer \
    --ptb_rate 0.20 \
    --attack grbcd \
    --emb_type bow \
    --device 0 \
    --re_split 2

STRG Attack (Transductive):

python gen_attacks_transductive.py \
    --dataset cora \
    --ptb_rate 0.30 \
    --attack strg \
    --emb_type bow \
    --threshold 0.5 \
    --device 0 \
    --re_split 1

Batch Structural Attacks:

# Edit datasets in run_structure_attacks.sh, then run:
bash run_structure_attacks.sh

Text Attacks

TextFooler Attack (Inductive):

python gen_text_attacks_inductive.py \
    --dataset cora \
    --ptb_rate 0.40 \
    --attack textfooler \
    --emb_type MiniLM \
    --device 0 \
    --re_split 2 \
    --seeds 3  # Generates seeds [0,1,2]

TextFooler Attack (Transductive):

python gen_text_attacks_transductive.py \
    --dataset cora \
    --ptb_rate 0.80 \
    --attack textfooler \
    --emb_type MiniLM \
    --device 0 \
    --re_split 1 \
    --seeds 3  # Generates seeds [0,1,2]

LLM Attack with GPT-4o-mini (Inductive):

python gen_text_attacks_inductive_llm.py \
    --dataset cora \
    --ptb_rate 0.40 \
    --attack gpt \
    --emb_type bow \
    --device 0 \
    --re_split 2 \
    --seeds 3 \
    --model_name "gpt-4o-mini"  # Generates seeds [0,1,2]

LLM Attack with GPT-4o-mini (Transductive):

python gen_text_attacks_transductive_llm.py \
    --dataset cora \
    --ptb_rate 0.80 \
    --attack gpt \
    --emb_type bow \
    --device 0 \
    --re_split 1 \
    --seeds 3 \
    --model_name "gpt-4o-mini"  # Generates seeds [0,1,2]

WTGIA Attack (Advanced Hybrid):

# Cora dataset
python gen_wtgia_inductive.py \
    --dataset cora \
    --emb_type bow \
    --injection atdgia \
    --n_inject 60 \
    --n_edges 20 \
    --sp_level 0.15 \
    --eval_robo \
    --verbose

# CiteSeer dataset  
python gen_wtgia_inductive.py \
    --dataset citeseer \
    --emb_type bow \
    --injection atdgia \
    --n_inject 90 \
    --n_edges 10 \
    --sp_level 0.15 \
    --eval_robo \
    --verbose

# PubMed dataset
python gen_wtgia_inductive.py \
    --dataset pubmed \
    --emb_type bow \
    --injection atdgia \
    --n_inject 400 \
    --n_edges 25 \
    --sp_level 0.15 \
    --eval_robo \
    --verbose \
    --batch_size 50

Batch Text Attacks:

# TextFooler batch processing
bash run_text_attacks_textfooler.sh

# LLM attacks batch processing  
bash run_text_attacks_llm_ind.sh    # Inductive
bash run_text_attacks_llm_trans.sh  # Transductive

Complete Attack Generation Workflow

Use these existing batch scripts to generate all attacks systematically:

Batch Structural Attacks:

cd attacks/
# Edit run_structure_attacks.sh to configure datasets and attacks
bash run_structure_attacks.sh

Batch Text Attacks:

cd attacks/
# TextFooler attacks (inductive and transductive)
bash run_text_attacks_textfooler.sh

# LLM attacks  
bash run_text_attacks_llm_ind.sh    # Inductive
bash run_text_attacks_llm_trans.sh  # Transductive

Guard Attacks (PGD with Cosine Similarity Thresholds):

cd attacks/
# Generate PGD attacks with various cosine similarity thresholds for GNNGuard evaluation
bash run_guard_attacks.sh

WTGIA Attacks (Individual Generation Required):

#!/bin/bash
cd attacks/
# Cora
python gen_wtgia_inductive.py \
    --dataset cora \
    --emb_type bow \
    --injection atdgia \
    --n_inject 60 \
    --n_edges 20 \
    --sp_level 0.15 \
    --eval_robo \
    --verbose

# CiteSeer
python gen_wtgia_inductive.py \
    --dataset citeseer \
    --emb_type bow \
    --injection atdgia \
    --n_inject 90 \
    --n_edges 10 \
    --sp_level 0.15 \
    --eval_robo \
    --verbose

# PubMed
python gen_wtgia_inductive.py \
    --dataset pubmed \
    --emb_type bow \
    --injection atdgia \
    --n_inject 400 \
    --n_edges 25 \
    --sp_level 0.15 \
    --eval_robo \
    --verbose \
    --batch_size 50

Step 6: Run GNN Defense Evaluation

Navigate to defenses directory and run evaluations:

cd defenses/

Inductive Setting Evaluation:

bash run_evaluation_inductive.sh

Transductive Setting Evaluation:

bash run_evaluation_transductive.sh

Text Attack Defense Evaluation:

bash run_evaluation_inductive_text.sh      # Inductive text attacks
bash run_evaluation_transductive_text.sh   # Transductive text attacks

WTGIA Defense Evaluation:

bash run_evaluation_wtgia.sh

AutoGCN Defense Evaluation:

# AutoGCN against structural attacks
bash run_auto_gcn.sh

# AutoGCN against text attacks  
bash run_auto_gcn_text.sh

Individual Model Evaluation:

python eval_inductive.py \
    --dataset cora \
    --model gcn \
    --attack pgd \
    --atk_emb_type roberta \
    --def_emb_type roberta \
    --ptb_rate 0.20 \
    --device 0

Step 7: LLM Training and Evaluation

Navigate to LLM_scripts directory:

cd LLM_scripts/

InstructionTuning (SFT)

Clean Training (Transductive):

# Run for seeds 0,1,2
for seed in 0 1 2; do
    bash run_sft_trans.sh cora $seed Mistral-7B neighbor_label
done

Clean Training (Inductive):

# Run for seeds 0,1,2
for seed in 0 1 2; do
    bash run_sft_ind.sh cora $seed Mistral-7B neighbor
done

Attack Training (Transductive):

# Run for seeds 0,1,2
for seed in 0 1 2; do
    bash run_sft_atk_trans.sh cora gpt 0.8 $seed Mistral-7B bow neighbor_label
done

Attack Training (Inductive):

# Run for seeds 0,1,2
for seed in 0 1 2; do
    bash run_sft_atk_ind.sh cora gpt 0.4 $seed Mistral-7B bow neighbor
done

Auto Prompt Training (Inductive):

# Run for seeds 0,1,2
for seed in 0 1 2; do
    bash run_sft_ind.sh cora $seed Mistral-7B auto
done

GraphGPT

Clean Training (Transductive):

# Run for seeds 0,1,2
for seed in 0 1 2; do
    bash run_graphgpt_trans.sh cora $seed Mistral-7B
done

Clean Training (Inductive):

# Run for seeds 0,1,2
for seed in 0 1 2; do
    bash run_graphgpt_ind.sh cora $seed Mistral-7B
done

Attack Training (Transductive):

# Run for seeds 0,1,2
for seed in 0 1 2; do
    bash run_graphgpt_atk_trans.sh cora gpt 0.8 $seed Mistral-7B bow
done

Attack Training (Inductive):

# Run for seeds 0,1,2
for seed in 0 1 2; do
    bash run_graphgpt_atk_ind.sh cora gpt 0.4 $seed Mistral-7B bow
done

LLaGA

Clean Training (Transductive):

# Run for seeds 0,1,2
for seed in 0 1 2; do
    bash run_llaga_trans.sh cora $seed noise 0 Mistral-7B
done

Clean Training (Inductive):

# Run for seeds 0,1,2  
for seed in 0 1 2; do
    bash run_llaga_ind.sh cora $seed noise 0 Mistral-7B
done

Attack Training:

# Run for seeds 0,1,2
for seed in 0 1 2; do
    bash run_llaga_atk_trans.sh cora gpt 0.8 $seed Mistral-7B bow noise
    bash run_llaga_atk_ind.sh cora gpt 0.4 $seed Mistral-7B bow noise
done

Project Structure

code/
├── README.md                    # This file
├── Embedding/                   # Text embedding generation
│   ├── embedding.py            # Embedding generation script
│   └── gen_all.sh              # Batch embedding generation
├── attacks/                     # Attack generation
│   ├── gen_attacks_*.py        # Structural attack scripts
│   ├── gen_text_attacks_*.py   # Text attack scripts  
│   ├── gen_wtgia_*.py          # WTGIA hybrid attacks
│   ├── run_*.sh                # Batch attack scripts
│   └── text_attack.py          # Text attack utilities
├── defenses/                    # Defense evaluation
│   ├── eval_*.py               # Evaluation scripts
│   ├── run_evaluation_*.sh     # Batch evaluation scripts
│   └── config.yaml             # Defense configuration
├── LLM_scripts/                # LLM training scripts
│   ├── run_sft_*.sh            # InstructionTuning scripts
│   ├── run_graphgpt_*.sh       # GraphGPT scripts
│   └── run_llaga_*.sh          # LLaGA scripts
├── LLMPredictor/               # LLM model implementations
│   ├── InstructionTuning/      # SFT implementation
│   ├── GraphGPT/               # GraphGPT implementation
│   └── LLaGA/                  # LLaGA implementation
├── common/                     # Shared utilities
│   ├── dataloader.py           # Data loading
│   ├── model_path.py           # Model path configuration
│   └── *.py                    # Other utilities
└── GreatX/                     # Graph attack/defense library

Acknowledgments

We acknowledge the following datasets and repositories:

  • LLMNodeBed Dataset: We thank the authors for providing the comprehensive graph datasets and embeddings available at HuggingFace and Google Drive.

  • LLMNodeBed Repository: We express our gratitude for the LLMNodeBed framework and codebase available at https://github.com/WxxShirley/LLMNodeBed.

  • GreatX Library: This project uses the GreatX library for graph adversarial attacks and defenses. We acknowledge the original authors and maintainers of this excellent toolkit.

About

Repo for "Robustness in Text-Attributed Graph Learning: Insights, Trade-offs, and New Defenses"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages