Skip to content

phantom2810/IBM-Differential-Privacy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IBM Differential Privacy

A comprehensive implementation and exploration of differential privacy techniques using IBM's Differential Privacy Library (Diffprivlib).

Overview

This project demonstrates the implementation of differential privacy mechanisms for machine learning and data analytics tasks. Differential privacy provides mathematical guarantees for protecting individual privacy while still allowing meaningful statistical analysis and machine learning on datasets.

Differential privacy adds carefully calibrated noise to data or algorithms to ensure that the output doesn't reveal information about any individual in the dataset, while maintaining the overall utility of the data for analysis.

Features

  • Privacy Mechanisms: Implementation of core differential privacy mechanisms including Laplace, Gaussian, and Exponential mechanisms
  • Machine Learning Models: Differentially private versions of common ML algorithms (classification, regression, clustering)
  • Privacy Budget Management: Tools for tracking and managing privacy budget across multiple operations
  • Data Analysis Tools: Differentially private statistical functions and data exploration utilities
  • Educational Examples: Step-by-step tutorials and examples for learning differential privacy concepts

Installation

Prerequisites

  • Python 3.7 or higher
  • pip or conda package manager

Install Dependencies

# Install the IBM Differential Privacy Library
pip install diffprivlib

# Install additional dependencies
pip install numpy pandas scikit-learn matplotlib jupyter
# Libraries for private language model fine-tuning
pip install torch transformers datasets opacus

Clone and Setup

git clone https://github.com/phantom2810/IBM-Differential-Privacy.git
cd IBM-Differential-Privacy
pip install -r requirements.txt

Quick Start

Here's a simple example of training a differentially private classifier:

from sklearn import datasets
from sklearn.model_selection import train_test_split
from diffprivlib.models import GaussianNB

# Load dataset
dataset = datasets.load_iris()
X_train, X_test, y_train, y_test = train_test_split(
    dataset.data, dataset.target, test_size=0.2
)

# Train differentially private model
dp_clf = GaussianNB(epsilon=1.0)
dp_clf.fit(X_train, y_train)

# Evaluate
accuracy = dp_clf.score(X_test, y_test)
print(f"Differentially private accuracy: {accuracy}")

Private LLM Fine-Tuning

You can also fine-tune Hugging Face language models with differential privacy using the train_dp_llm function:

from dp_llm import train_dp_llm

model_dir = train_dp_llm(
    model_name="gpt2",
    dataset_path="data/my_corpus.txt",
    epsilon=5.0,
    delta=1e-5,
    max_grad_norm=1.0,
)
print("Model saved to", model_dir)

Core Components

1. Privacy Mechanisms

The building blocks of differential privacy:

  • Laplace Mechanism: Adds Laplace noise for ε-differential privacy
  • Gaussian Mechanism: Adds Gaussian noise for (ε,δ)-differential privacy
  • Exponential Mechanism: Selects outputs based on a utility function
  • Sparse Vector Technique: Efficiently answers many queries

2. Machine Learning Models

Differentially private implementations of popular algorithms:

  • Classification: Naive Bayes, Logistic Regression, Random Forest
  • Regression: Linear Regression with objective perturbation
  • Clustering: K-means with privacy guarantees
  • Dimensionality Reduction: Principal Component Analysis (PCA)

3. Privacy Budget Management

  • BudgetAccountant: Track privacy loss across multiple operations
  • Composition Theorems: Calculate total privacy loss using advanced composition
  • Privacy Analysis: Tools for analyzing privacy guarantees

Project Structure

IBM-Differential-Privacy/
├── examples/                  # Tutorial notebooks and examples
│   ├── basic_mechanisms.ipynb # Introduction to DP mechanisms
│   ├── ml_models.ipynb       # Differentially private ML
│   └── privacy_budget.ipynb  # Budget management examples
├── src/                      # Source code
│   ├── mechanisms/           # Custom mechanism implementations
│   ├── models/              # Extended ML model implementations
│   ├── utils/               # Utility functions
│   └── analysis/            # Privacy analysis tools
├── datasets/                # Sample datasets for examples
├── tests/                   # Unit tests
├── docs/                    # Documentation
├── requirements.txt         # Python dependencies
└── README.md               # This file

Examples and Tutorials

Basic Privacy Mechanisms

Explore fundamental differential privacy mechanisms:

from diffprivlib.mechanisms import Laplace

# Add Laplace noise to a query result
mechanism = Laplace(epsilon=1.0, sensitivity=1.0)
private_result = mechanism.randomise(actual_count)

Differentially Private Data Analysis

Perform statistical analysis with privacy guarantees:

from diffprivlib.tools import histogram, mean

# Compute differentially private histogram
dp_hist = histogram(data, epsilon=1.0, bins=10)

# Calculate differentially private mean
dp_mean = mean(data, epsilon=0.5, bounds=(0, 100))

Privacy Budget Management

Track privacy expenditure across multiple operations:

from diffprivlib.accountant import BudgetAccountant

accountant = BudgetAccountant(epsilon=1.0, delta=1e-5)

# Spend budget on operations
with accountant:
    result1 = private_operation_1(data, epsilon=0.3)
    result2 = private_operation_2(data, epsilon=0.7)

print(f"Remaining budget: {accountant.remaining()}")

Privacy Parameters

Understanding key differential privacy parameters:

  • ε (epsilon): Privacy parameter - smaller values mean stronger privacy
  • δ (delta): Failure probability - typically set to 1/n where n is dataset size
  • Sensitivity: Maximum change in output when one record is added/removed
  • Privacy Budget: Total amount of privacy loss allowed across all operations

Use Cases

This project demonstrates differential privacy applications in:

  • Healthcare Data: Analyzing medical records while protecting patient privacy
  • Financial Analytics: Processing transaction data with privacy guarantees
  • Survey Research: Publishing statistics from sensitive surveys
  • Machine Learning: Training models on private data for deployment
  • Government Statistics: Creating public datasets from census/administrative data

Research and References

This project is based on research and implementations from:

  • IBM Research: Diffprivlib library and associated research papers
  • Academic Literature: Foundational papers on differential privacy
  • Industry Applications: Real-world use cases and best practices

Key papers:

  • Dwork, C. (2006). Differential Privacy. ICALP
  • Holohan, N. et al. (2019). Diffprivlib: The IBM Differential Privacy Library. arXiv:1907.02444

Contributing

We welcome contributions to this project! Please see our contributing guidelines:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/new-mechanism)
  3. Commit your changes (git commit -am 'Add new privacy mechanism')
  4. Push to the branch (git push origin feature/new-mechanism)
  5. Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • IBM Research for developing and open-sourcing the Differential Privacy Library
  • Differential Privacy Research Community for foundational work and ongoing research
  • Contributors who have helped improve and extend this project

Contact

For questions, suggestions, or collaboration opportunities:

  • GitHub Issues: Use the issue tracker for bug reports and feature requests
  • Email: [Contact information]
  • Academic Collaboration: Open to research partnerships and academic projects

Additional Resources


This project is for educational and research purposes. For production use of differential privacy, please consult with privacy experts and consider formal privacy audits.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published