IBM Differential Privacy

A comprehensive implementation and exploration of differential privacy techniques using IBM's Differential Privacy Library (Diffprivlib).

Overview

This project demonstrates the implementation of differential privacy mechanisms for machine learning and data analytics tasks. Differential privacy provides mathematical guarantees for protecting individual privacy while still allowing meaningful statistical analysis and machine learning on datasets.

Differential privacy adds carefully calibrated noise to data or algorithms to ensure that the output doesn't reveal information about any individual in the dataset, while maintaining the overall utility of the data for analysis.

Features

Privacy Mechanisms: Implementation of core differential privacy mechanisms including Laplace, Gaussian, and Exponential mechanisms
Machine Learning Models: Differentially private versions of common ML algorithms (classification, regression, clustering)
Privacy Budget Management: Tools for tracking and managing privacy budget across multiple operations
Data Analysis Tools: Differentially private statistical functions and data exploration utilities
Educational Examples: Step-by-step tutorials and examples for learning differential privacy concepts

Installation

Prerequisites

Python 3.7 or higher
pip or conda package manager

Install Dependencies

# Install the IBM Differential Privacy Library
pip install diffprivlib

# Install additional dependencies
pip install numpy pandas scikit-learn matplotlib jupyter
# Libraries for private language model fine-tuning
pip install torch transformers datasets opacus

Clone and Setup

git clone https://github.com/phantom2810/IBM-Differential-Privacy.git
cd IBM-Differential-Privacy
pip install -r requirements.txt

Quick Start

Here's a simple example of training a differentially private classifier:

from sklearn import datasets
from sklearn.model_selection import train_test_split
from diffprivlib.models import GaussianNB

# Load dataset
dataset = datasets.load_iris()
X_train, X_test, y_train, y_test = train_test_split(
    dataset.data, dataset.target, test_size=0.2
)

# Train differentially private model
dp_clf = GaussianNB(epsilon=1.0)
dp_clf.fit(X_train, y_train)

# Evaluate
accuracy = dp_clf.score(X_test, y_test)
print(f"Differentially private accuracy: {accuracy}")

Private LLM Fine-Tuning

You can also fine-tune Hugging Face language models with differential privacy using the train_dp_llm function:

from dp_llm import train_dp_llm

model_dir = train_dp_llm(
    model_name="gpt2",
    dataset_path="data/my_corpus.txt",
    epsilon=5.0,
    delta=1e-5,
    max_grad_norm=1.0,
)
print("Model saved to", model_dir)

Core Components

1. Privacy Mechanisms

The building blocks of differential privacy:

Laplace Mechanism: Adds Laplace noise for ε-differential privacy
Gaussian Mechanism: Adds Gaussian noise for (ε,δ)-differential privacy
Exponential Mechanism: Selects outputs based on a utility function
Sparse Vector Technique: Efficiently answers many queries

2. Machine Learning Models

Differentially private implementations of popular algorithms:

Classification: Naive Bayes, Logistic Regression, Random Forest
Regression: Linear Regression with objective perturbation
Clustering: K-means with privacy guarantees
Dimensionality Reduction: Principal Component Analysis (PCA)

3. Privacy Budget Management

BudgetAccountant: Track privacy loss across multiple operations
Composition Theorems: Calculate total privacy loss using advanced composition
Privacy Analysis: Tools for analyzing privacy guarantees

Project Structure

IBM-Differential-Privacy/
├── examples/                  # Tutorial notebooks and examples
│   ├── basic_mechanisms.ipynb # Introduction to DP mechanisms
│   ├── ml_models.ipynb       # Differentially private ML
│   └── privacy_budget.ipynb  # Budget management examples
├── src/                      # Source code
│   ├── mechanisms/           # Custom mechanism implementations
│   ├── models/              # Extended ML model implementations
│   ├── utils/               # Utility functions
│   └── analysis/            # Privacy analysis tools
├── datasets/                # Sample datasets for examples
├── tests/                   # Unit tests
├── docs/                    # Documentation
├── requirements.txt         # Python dependencies
└── README.md               # This file

Examples and Tutorials

Basic Privacy Mechanisms

Explore fundamental differential privacy mechanisms:

from diffprivlib.mechanisms import Laplace

# Add Laplace noise to a query result
mechanism = Laplace(epsilon=1.0, sensitivity=1.0)
private_result = mechanism.randomise(actual_count)

Differentially Private Data Analysis

Perform statistical analysis with privacy guarantees:

from diffprivlib.tools import histogram, mean

# Compute differentially private histogram
dp_hist = histogram(data, epsilon=1.0, bins=10)

# Calculate differentially private mean
dp_mean = mean(data, epsilon=0.5, bounds=(0, 100))

Privacy Budget Management

Track privacy expenditure across multiple operations:

from diffprivlib.accountant import BudgetAccountant

accountant = BudgetAccountant(epsilon=1.0, delta=1e-5)

# Spend budget on operations
with accountant:
    result1 = private_operation_1(data, epsilon=0.3)
    result2 = private_operation_2(data, epsilon=0.7)

print(f"Remaining budget: {accountant.remaining()}")

Privacy Parameters

Understanding key differential privacy parameters:

ε (epsilon): Privacy parameter - smaller values mean stronger privacy
δ (delta): Failure probability - typically set to 1/n where n is dataset size
Sensitivity: Maximum change in output when one record is added/removed
Privacy Budget: Total amount of privacy loss allowed across all operations

Use Cases

This project demonstrates differential privacy applications in:

Healthcare Data: Analyzing medical records while protecting patient privacy
Financial Analytics: Processing transaction data with privacy guarantees
Survey Research: Publishing statistics from sensitive surveys
Machine Learning: Training models on private data for deployment
Government Statistics: Creating public datasets from census/administrative data

Research and References

This project is based on research and implementations from:

IBM Research: Diffprivlib library and associated research papers
Academic Literature: Foundational papers on differential privacy
Industry Applications: Real-world use cases and best practices

Key papers:

Dwork, C. (2006). Differential Privacy. ICALP
Holohan, N. et al. (2019). Diffprivlib: The IBM Differential Privacy Library. arXiv:1907.02444

Contributing

We welcome contributions to this project! Please see our contributing guidelines:

Fork the repository
Create a feature branch (git checkout -b feature/new-mechanism)
Commit your changes (git commit -am 'Add new privacy mechanism')
Push to the branch (git push origin feature/new-mechanism)
Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

IBM Research for developing and open-sourcing the Differential Privacy Library
Differential Privacy Research Community for foundational work and ongoing research
Contributors who have helped improve and extend this project

Contact

For questions, suggestions, or collaboration opportunities:

GitHub Issues: Use the issue tracker for bug reports and feature requests
Email: [Contact information]
Academic Collaboration: Open to research partnerships and academic projects

Additional Resources

This project is for educational and research purposes. For production use of differential privacy, please consult with privacy experts and consider formal privacy audits.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
static/css		static/css
templates		templates
.DS_Store		.DS_Store
README.md		README.md
dp_llm.py		dp_llm.py
main.py		main.py
model_test.py		model_test.py
noise_add.py		noise_add.py
test_dp_llm.py		test_dp_llm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IBM Differential Privacy

Overview

Features

Installation

Prerequisites

Install Dependencies

Clone and Setup

Quick Start

Private LLM Fine-Tuning

Core Components

1. Privacy Mechanisms

2. Machine Learning Models

3. Privacy Budget Management

Project Structure

Examples and Tutorials

Basic Privacy Mechanisms

Differentially Private Data Analysis

Privacy Budget Management

Privacy Parameters

Use Cases

Research and References

Contributing

License

Acknowledgments

Contact

Additional Resources

About

Uh oh!

Releases

Packages

Languages

phantom2810/IBM-Differential-Privacy

Folders and files

Latest commit

History

Repository files navigation

IBM Differential Privacy

Overview

Features

Installation

Prerequisites

Install Dependencies

Clone and Setup

Quick Start

Private LLM Fine-Tuning

Core Components

1. Privacy Mechanisms

2. Machine Learning Models

3. Privacy Budget Management

Project Structure

Examples and Tutorials

Basic Privacy Mechanisms

Differentially Private Data Analysis

Privacy Budget Management

Privacy Parameters

Use Cases

Research and References

Contributing

License

Acknowledgments

Contact

Additional Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages