yushundong · T-Breezy444 · Oct 3, 2024 · Oct 3, 2024 · Oct 3, 2024 · Oct 7, 2024
diff --git a/README.md b/README.md
@@ -1,27 +1,105 @@
+# PyGIP Installation Guide
+
+PyGIP supports multiple CUDA versions and provides two installation methods. Choose the method that best suits your needs.
+
+## Method 1: Direct Installation
+
+Create and activate a new conda environment:
 ```bash
-# pip install
 conda create -n pygip python=3.10.14
 conda activate pygip
-# if you use cuda 11.x
+```
+
+### Choose your CUDA version:
+
+#### For CUDA 11.x users:
+```bash
 pip install pygip -f https://data.dgl.ai/wheels/torch-2.3/cu118/repo.html --extra-index-url https://download.pytorch.org/whl/cu118
-# if you use cuda 12.x
-# pip install pygip -f https://data.dgl.ai/wheels/torch-2.3/cu121/repo.html --extra-index-url https://data.dgl.ai/wheels/torch-2.3/cu121/repo.html
 ```
 
+#### For CUDA 12.x users:
+```bash
+pip install pygip -f https://data.dgl.ai/wheels/torch-2.3/cu121/repo.html --extra-index-url https://data.dgl.ai/wheels/torch-2.3/cu121/repo.html
+```
+
+## Method 2: Environment Setup 
+
+This method uses a predefined environment.yml file and is recommended for development:
 
+1. Create and activate the environment:
 ```bash
-# Simple setup.
 conda env create -f environment.yml -n pygip
 conda activate pygip
-pip install dgl -f https://data.dgl.ai/wheels/repo.html #due to dgl issues, unfortunately we have to install this dgl 2.2.1 manually.
+```
 
-# Under the GNNIP directory
+2. Install DGL manually (required due to DGL 2.2.1 dependency issues):
+```bash
+pip install dgl -f https://data.dgl.ai/wheels/repo.html
+```
+
+3. Set up the Python path (run this from the PyGIP root directory):
+```bash
+# Linux/Mac:
 export PYTHONPATH=`pwd`
 
-# Quick testing
-python3 examples/examples.py
+# Windows:
+set PYTHONPATH=%cd%
 ```
 
+4. Test the installation:
+```bash
+python examples/examples.py
+```
+
+## Verifying CUDA Setup
+
+To verify your CUDA installation is working correctly:
+```python
+import torch
+print("CUDA Available:", torch.cuda.is_available())
+print("CUDA Version:", torch.version.cuda)
+print("GPU Device:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "No GPU found")
+```
+
+## Troubleshooting
+
+If you encounter CUDA-related issues:
+
+1. Ensure your NVIDIA drivers are up to date:
+```bash
+nvidia-smi
+```
+
+2. If you need to reinstall PyTorch with a specific CUDA version:
+```bash
+# Remove existing torch installation
+pip uninstall torch torch-geometric -y
+
+# For CUDA 11.x:
+pip install torch --index-url https://download.pytorch.org/whl/cu118
+pip install torch-geometric==2.5.0
+
+# For CUDA 12.x:
+pip install torch --index-url https://download.pytorch.org/whl/cu121
+pip install torch-geometric==2.5.0
+```
+
+3. Verify DGL installation:
+```bash
+python -c "import dgl; print(dgl.__version__)"
+```
+
+## Requirements
+
+PyGIP has been tested with the following core dependencies:
+- Python 3.10.14
+- PyTorch 2.3.0
+- torch-geometric 2.5.0
+- DGL 2.2.1
+
+For a complete list of dependencies, see the `requirements.txt` file in the repository.y
+
+
 # Attack
 
 ## Model Extraction Attacks against Graph Neural Network

diff --git a/environment.yml b/environment.yml
@@ -27,7 +27,7 @@ dependencies:
       - markupsafe==2.1.5
       - mpmath==1.3.0
       - networkx==3.3
-      - numpy==2.0.1
+      - numpy>=1.23.5,<2.0.0
       - pandas==2.2.2
       - psutil==6.0.0
       - pydantic==2.8.2

diff --git a/pygip/data_free_attack/README.md b/pygip/data_free_attack/README.md
@@ -0,0 +1,86 @@
+# Data-free Model Extraction Attacks
+
+This directory contains an implementation of data-free model extraction attacks on Graph Neural Networks (GNNs).
+
+## Files
+
+1. `example.py`: Interactive script demonstrating how to run data-free attacks
+2. `models/`:
+   - `generator.py`: Graph generator implementation
+   - `victim.py`: Victim model implementations
+3. `attacks/`:
+   - `attack1.py`: Type I Attack implementation
+   - `attack2.py`: Type II Attack implementation
+   - `attack3.py`: Type III Attack implementation
+
+## Running Data-free Attacks
+
+The `example.py` script provides an interactive way to run data-free attacks on GNN models. Here's how to use it:
+
+```bash
+python example.py
+```
+
+When you run the script, it will:
+1. Load the Cora dataset
+2. Create and train a victim model
+3. Prompt you to choose an attack type:
+   ```
+   Choose attack type (1, 2, or 3): 
+   ```
+4. Run the selected attack with the following default parameters:
+   ```python
+   noise_dim = 32
+   num_nodes = 500
+   num_queries = 300
+   generator_lr = 1e-6
+   surrogate_lr = 0.001
+   n_generator_steps = 2
+   n_surrogate_steps = 5
+   ```
+
+### Attack Types
+
+1. Type I Attack: Basic model extraction attack
+2. Type II Attack: Enhanced extraction with improved query strategy
+3. Type III Attack: Advanced extraction with additional model architecture considerations
+
+Choose the attack type by entering the corresponding number (1, 2, or 3) when prompted.
+
+### Sample Output
+
+```
+Epoch 10/200, Train Loss: 1.7342, Val Loss: 1.8183, Val Acc: 0.7460
+Epoch 20/200, Train Loss: 1.3186, Val Loss: 1.5902, Val Acc: 0.7860
+Epoch 30/200, Train Loss: 0.8908, Val Loss: 1.3175, Val Acc: 0.7880
+Epoch 40/200, Train Loss: 0.5930, Val Loss: 1.0948, Val Acc: 0.7860
+Epoch 50/200, Train Loss: 0.4184, Val Loss: 0.9633, Val Acc: 0.7940
+Epoch 60/200, Train Loss: 0.3414, Val Loss: 0.8969, Val Acc: 0.7900
+Epoch 70/200, Train Loss: 0.2943, Val Loss: 0.8568, Val Acc: 0.7900
+Epoch 80/200, Train Loss: 0.2577, Val Loss: 0.8343, Val Acc: 0.7940
+Epoch 90/200, Train Loss: 0.2487, Val Loss: 0.8058, Val Acc: 0.7960
+Epoch 100/200, Train Loss: 0.2310, Val Loss: 0.7731, Val Acc: 0.7880
+Epoch 110/200, Train Loss: 0.2129, Val Loss: 0.7825, Val Acc: 0.7900
+Epoch 120/200, Train Loss: 0.2092, Val Loss: 0.7696, Val Acc: 0.7920
+Epoch 130/200, Train Loss: 0.1865, Val Loss: 0.7548, Val Acc: 0.7940
+Epoch 140/200, Train Loss: 0.1748, Val Loss: 0.7522, Val Acc: 0.7960
+Epoch 150/200, Train Loss: 0.1769, Val Loss: 0.7385, Val Acc: 0.7940
+Epoch 160/200, Train Loss: 0.1682, Val Loss: 0.7552, Val Acc: 0.7920
+Epoch 170/200, Train Loss: 0.1557, Val Loss: 0.7254, Val Acc: 0.7880
+Epoch 180/200, Train Loss: 0.1608, Val Loss: 0.7346, Val Acc: 0.7940
+Epoch 190/200, Train Loss: 0.1517, Val Loss: 0.7433, Val Acc: 0.7860
+Epoch 200/200, Train Loss: 0.1482, Val Loss: 0.7290, Val Acc: 0.7940
+Victim Model Accuracy: 0.8070
+
+Choose attack type (1, 2, or 3): 2
+
+Running Type II Attack...
+Attacking: 100%|██████████████████████████████| 300/300 [01:09<00:00, 4.29it/s, Gen Loss=-0.3422, Surr Loss=0.4532] 
+Type II Attack - Surrogate Model Accuracy: 0.8090
+```
+
+The script will display:
+1. Training progress of the victim model, showing loss and validation accuracy
+2. Final victim model accuracy
+3. Progress bar during the attack
+4. Final surrogate model accuracy
diff --git a/pygip/data_free_attack/attacks/attack1.py b/pygip/data_free_attack/attacks/attack1.py
@@ -0,0 +1,120 @@
+import torch
+import torch.nn as nn
+import torch.optim as optim
+from tqdm import tqdm
+
+class TypeIAttack:
+    def __init__(self, generator, surrogate_model, victim_model, device, 
+                 noise_dim, num_nodes, feature_dim,
+                 generator_lr=1e-6, surrogate_lr=0.001,
+                 n_generator_steps=2, n_surrogate_steps=5):
+        self.generator = generator
+        self.surrogate_model = surrogate_model
+        self.victim_model = victim_model
+        self.device = device
+        self.noise_dim = noise_dim
+        self.num_nodes = num_nodes
+        self.feature_dim = feature_dim
+
+        self.generator_optimizer = optim.Adam(self.generator.parameters(), lr=generator_lr)
+        self.surrogate_optimizer = optim.Adam(self.surrogate_model.parameters(), lr=surrogate_lr)
+
+        self.criterion = nn.CrossEntropyLoss()
+        self.n_generator_steps = n_generator_steps
+        self.n_surrogate_steps = n_surrogate_steps
+
+    def generate_graph(self):
+        z = torch.randn(1, self.noise_dim).to(self.device)
+        features, adj = self.generator(z)
+        edge_index = self.generator.adj_to_edge_index(adj)
+        return features, edge_index
+
+    def train_generator(self):
+        self.generator.train()
+        self.surrogate_model.eval()
+
+        total_loss = 0
+        for _ in range(self.n_generator_steps):
+            self.generator_optimizer.zero_grad()
+
+            features, edge_index = self.generate_graph()
+
+            with torch.no_grad():
+                victim_output = self.victim_model(features, edge_index)
+            surrogate_output = self.surrogate_model(features, edge_index)
+
+            loss = -self.criterion(surrogate_output, victim_output.argmax(dim=1))
+
+            # Zeroth-order optimization with multiple random directions
+            epsilon = 1e-6
+            num_directions = 2
+            estimated_gradient = torch.zeros_like(features)
+
+            for _ in range(num_directions):
+                u = torch.randn_like(features)
+                perturbed_features = features + epsilon * u
+
+                with torch.no_grad():
+                    perturbed_victim_output = self.victim_model(perturbed_features, edge_index)
+                perturbed_surrogate_output = self.surrogate_model(perturbed_features, edge_index)
+                perturbed_loss = -self.criterion(perturbed_surrogate_output, perturbed_victim_output.argmax(dim=1))
+
+                estimated_gradient += (perturbed_loss - loss) / epsilon * u
+
+            estimated_gradient /= num_directions
+            features.grad = estimated_gradient
+
+            self.generator_optimizer.step()
+            total_loss += loss.item()
+
+        return total_loss / self.n_generator_steps
+
+    def train_surrogate(self):
+        self.generator.eval()
+        self.surrogate_model.train()
+
+        total_loss = 0
+        for _ in range(self.n_surrogate_steps):
+            self.surrogate_optimizer.zero_grad()
+
+            features, edge_index = self.generate_graph()
+
+            with torch.no_grad():
+                victim_output = self.victim_model(features, edge_index)
+            surrogate_output = self.surrogate_model(features, edge_index)
+
+            loss = self.criterion(surrogate_output, victim_output.argmax(dim=1))
+
+            loss.backward()
+            torch.nn.utils.clip_grad_norm_(self.surrogate_model.parameters(), max_norm=1.0)
+            self.surrogate_optimizer.step()
+
+            total_loss += loss.item()
+
+        return total_loss / self.n_surrogate_steps
+
+    def attack(self, num_queries, log_interval=10):
+        generator_losses = []
+        surrogate_losses = []
+
+        pbar = tqdm(range(num_queries), desc="Attacking")
+        for query in pbar:
+            gen_loss = self.train_generator()
+            surr_loss = self.train_surrogate()
+
+            generator_losses.append(gen_loss)
+            surrogate_losses.append(surr_loss)
+
+            if (query + 1) % log_interval == 0:
+                pbar.set_postfix({
+                    'Gen Loss': f"{gen_loss:.4f}",
+                    'Surr Loss': f"{surr_loss:.4f}"
+                })
+
+        return self.surrogate_model, generator_losses, surrogate_losses
+
+def run_attack(generator, surrogate_model, victim_model, num_queries, device, 
+               noise_dim, num_nodes, feature_dim):
+    attack = TypeIAttack(generator, surrogate_model, victim_model, device, 
+                         noise_dim, num_nodes, feature_dim)
+    return attack.attack(num_queries)