PyGIP is a Python library designed for experimenting with graph-based model extraction attacks and defenses. It provides a modular framework to implement and test attack and defense strategies on graph datasets.
To get started with PyGIP, set up your environment by installing the required dependencies:
pip install -r reqs.txtEnsure you have Python installed (version 3.8 or higher recommended) along with the necessary libraries listed
in reqs.txt.
Specifically, using following command to install dgl 2.2.1 and ensure your pytorch==2.3.0.
pip install dgl==2.2.1 -f https://data.dgl.ai/wheels/torch-2.3/repo.htmlHere’s a simple example to launch a model extraction attack using PyGIP:
from datasets import Cora
from models.attack import ModelExtractionAttack0
# Load the Cora dataset
dataset = Cora()
# Initialize the attack with a sampling ratio of 0.25
mea = ModelExtractionAttack0(dataset, 0.25)
# Execute the attack
mea.attack()This code loads the Cora dataset, initializes a basic model extraction attack (ModelExtractionAttack0), and runs the
attack with a specified sampling ratio.
Here’s an expanded and detailed version of the "Contribute to Code" section for your README.md, incorporating the
specifics of BaseAttack and Dataset you provided. This version is thorough, clear, and tailored for contributors:
PyGIP is built to be modular and extensible, allowing contributors to implement their own attack and defense strategies. Below, we detail how to extend the framework by implementing custom attack and defense classes, with a focus on how to leverage the provided dataset structure.
To create a custom attack, you need to extend the abstract base class BaseAttack. Here’s the structure
of BaseAttack:
class BaseAttack(ABC):
def __init__(self, dataset: Dataset, attack_node_fraction: float, model_path: str = None):
"""Base class for all attack implementations."""
self.dataset = dataset
self.graph = dataset.graph # Access the DGL-based graph directly
# Additional initialization can go here
@abstractmethod
def attack(self):
raise NotImplementedError
def _train_target_model(self):
raise NotImplementedError
def _train_attack_model(self):
raise NotImplementedError
def _load_model(self, model_path):
raise NotImplementedErrorTo implement your own attack:
- Inherit from
BaseAttack: Create a new class that inherits fromBaseAttack. You’ll need to provide the following required parameters in the constructor:
dataset: An instance of theDatasetclass (see below for details).attack_node_fraction: A float between 0 and 1 representing the fraction of nodes to attack.model_path(optional): A string specifying the path to a pre-trained model (defaults toNone).
You need to implement following methods:
attack(): Add main attack logic here. If multiple attack types are supported, define the attack type as an optional argument to this function.
For each specific attack type, implement a corresponding helper function such as_attack_type1()or_attack_type2(),
and call the appropriate helper insideattack()based on the given method name._load_model(): Load victim model._train_target_model(): Train victim model._train_attack_model(): Train attack model._helper_func()(optional): Add your helper functions based on your needs, but keep the methods private.
- Implement the
attack()Method: Override the abstractattack()method with your attack logic, and return a dict of results. For example:
class MyCustomAttack(BaseAttack):
def __init__(self, dataset: Dataset, attack_node_fraction: float, model_path: str = None):
super().__init__(dataset, attack_node_fraction, model_path)
# Additional initialization if needed
def attack(self):
# Example: Access the graph and perform an attack
print(f"Attacking {self.attack_node_fraction * 100}% of nodes")
num_nodes = self.graph.num_nodes()
print(f"Graph has {num_nodes} nodes")
# Add your attack logic here
return {
'metric1': 'metric1 here',
'metric2': 'metric2 here'
}
def _load_model(self):
# add your logic here
pass
def _train_target_model(self):
# add your logic here
pass
def _train_attack_model(self):
# add your logic here
passTo create a custom defense, you need to extend the abstract base class BaseDefense. Here’s the structure
of BaseDefense:
class BaseDefense(ABC):
def __init__(self, dataset: Dataset, attack_node_fraction: float):
"""Base class for all defense implementations."""
# add initialization here
@abstractmethod
def defend(self):
raise NotImplementedError
def _load_model(self):
raise NotImplementedError
def _train_target_model(self):
raise NotImplementedError
def _train_defense_model(self):
raise NotImplementedError
def _train_surrogate_model(self):
raise NotImplementedErrorTo implement your own defense:
- Inherit from
BaseDefense: Create a new class that inherits fromBaseDefense. You’ll need to provide the following required parameters in the constructor:
dataset: An instance of theDatasetclass (see below for details).attack_node_fraction: A float between 0 and 1 representing the fraction of nodes to attack.model_path(optional): A string specifying the path to a pre-trained model (defaults toNone).
You need to implement following methods:
defense(): Add main defense logic here. If multiple defense types are supported, define the defense type as an optional argument to this function.
For each specific defense type, implement a corresponding helper function such as_defense_type1()or_defense_type2(),
and call the appropriate helper insidedefense()based on the given method name._load_model(): Load victim model._train_target_model(): Train victim model._train_defense_model(): Train defense model._train_surrogate_model(): Train attack model._helper_func()(optional): Add your helper functions based on your needs, but keep the methods private.
- Implement the
defense()Method: Override the abstractdefense()method with your defense logic, and return a dict of results. For example:
class MyCustomDefense(BaseDefense):
def defend(self):
# Step 1: Train target model
target_model = self._train_target_model()
# Step 2: Attack target model
attack = MyCustomAttack(self.dataset, attack_node_fraction=0.3)
attack.attack(target_model)
# Step 3: Train defense model
defense_model = self._train_defense_model()
# Step 4: Test defense against attack
attack = MyCustomAttack(self.dataset, attack_node_fraction=0.3)
attack.attack(defense_model)
# Print performance metrics
def _load_model(self):
# add your logic here
pass
def _train_target_model(self):
# add your logic here
pass
def _train_defense_model(self):
# add your logic here
pass
def _train_surrogate_model(self):
# add your logic here
passThe Dataset class standardizes the data format across PyGIP. Here’s its structure:
class Dataset(object):
def __init__(self, api_type='pyg', path='./downloads/'):
self.api_type = api_type # Set to 'pyg' for torch_geometric-based graphs
self.path = path # Directory for dataset storage
self.dataset_name = "" # Name of the dataset (e.g., "Cora")
# Graph properties
self.node_number = 0 # Number of nodes
self.feature_number = 0 # Number of features per node
self.label_number = 0 # Number of label classes
# Core data
self.graph = None # PyG graph object
self.features = None # Node features
self.labels = None # Node labels
# Data splits
self.train_mask = None # Boolean mask for training nodes
self.val_mask = None # Boolean mask for validation nodes
self.test_mask = None # Boolean mask for test nodes- Importance: We are currently using the default api_type='pyg' to load the data. It is important to note that when
api_type='pyg',
self.graphshould be an instance oftorch_geometric.data.Data. In your implementation, make sure to use our defined Dataset class to build your code. - Additional attributes like
self.dataset.features(node features),self.dataset.labels(node labels), andself.dataset.train_mask(training split) are also available if your logic requires them.
- Reference Implementation: The
ModelExtractionAttack0class is a fully implemented attack example. Study it for inspiration or as a template. - Flexibility: Add as many helper functions as needed within your class to keep your code clean and modular.
- Backbone Models: We provide several basic backbone models like
GCN, GraphSAGE. You can use or add more atfrom models.nn import GraphSAGE.
By following these guidelines, you can seamlessly integrate your custom attack or defense strategies into PyGIP. Happy coding!
For internal team members with write access to the repository:
- Always Use Feature/Fix Branches
- Never commit directly to the main or develop branch.
- Create a new branch for each feature, bug fix.
git checkout -b feat/your-feature-namegit checkout -b fix/your-fix-name- Keep Commits Clean & Meaningful
- feat: add data loader for graph dataset
- fix: resolve crash on edge cases
Use clear commit messages following the format:
<type>: <summary>- Test Before Pushing
- Test your implementation in
example.py, and compare the performance with the results in original paper.
- Push to Internal Branch
- Always run
git pull origin pygip-releasebefore pushing your changes - Submit a pull request targeting the
pygip-releasebranch - Write a brief summary describing the features you’ve added, how to run your method, and how to evaluate its performance
Push to the remote feature branch.
git push origin feat/your-feature-nameRefer to guidline
MIT License
For questions or contributions, please contact blshen@fsu.edu.