- Overview
- System Architecture
- Quick Start
- Installation & Setup
- Configuration
- Usage Examples
- Blockchain Integration
- API Reference
- Troubleshooting
- Best Practices
- Contributing
This system provides blockchain-enabled machine learning provenance tracking with support for multiple blockchain networks (IPFS, Ethereum, Bitcoin). It stores Merkle tree hashes before and after training runs to ensure data integrity and provide tamper-evident audit trails.
- 🔗 Multi-Blockchain Support: IPFS, Ethereum, Bitcoin
- 📊 Merkle Tree Integration: Cryptographic verification of ML pipeline
- 🔒 Immutable Provenance: Tamper-evident audit trails
- ⚡ Auto Mode: Fully automated blockchain integration
- 🛠️ Developer Friendly: Easy setup and configuration
┌─────────────────────────────────────────────────────────────┐
│ ML Training Pipeline │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ Data │ │ Model │ │ Training │ │
│ │ Provenance │ │ Provenance │ │ Provenance │ │
│ └──────────────┘ └──────────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ ProvenanceTracker │
│ ┌─────────────────┐ ┌─────────────────┐ ┌──────────────┐ │
│ │ Merkle Tree │ │ Blockchain │ │ Provenance │ │
│ │ Generation │ │ Integration │ │ Data │ │
│ └─────────────────┘ └─────────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ BlockchainManager │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ Ethereum │ │ Bitcoin │ │ IPFS │ │
│ │ Interface │ │ Interface │ │ Interface │ │
│ └──────────────┘ └──────────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────────────┘
- Pre-Training: Data + Model → Merkle Tree → Root Hash → Blockchain Storage
- Training: Training Process → Epoch Updates → Merkle Tree Updates
- Post-Training: Final Model + Results → Merkle Tree → Root Hash → Blockchain Storage
- Verification: Stored Hashes → Blockchain Verification → Integrity Report
git clone <repository-url>
cd mnist_provenance
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt# Start local Geth node for Ethereum development
bash scripts/setup_local_geth.sh
# Or use IPFS only (default)
# No additional setup requiredpython3 scripts/demo_blockchain_provenance.pypython3 src/ml_provenance/training/train.py- Python 3.8+
- Git
- Homebrew (for macOS)
# Core ML dependencies
torch>=2.0.0
numpy==1.26.4
scikit-learn==1.4.1.post1
# Blockchain dependencies
web3>=6.0.0
requests>=2.31.0
ipfshttpclient>=0.8.0
gitpython==3.1.42
# Other dependencies
opacus==1.1.3
pandas==2.2.1
matplotlib==3.8.3-
Create Virtual Environment
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Dependencies
pip install -r requirements.txt
-
Verify Installation
python3 -c "from ml_provenance.provenance.blockchain import ProvenanceBlockchainTracker; print('✅ Installation successful')"
Create or update configs/blockchain_config.json:
{
"blockchain": {
"enabled": true,
"networks": ["ipfs", "ethereum"],
"ipfs": {
"enabled": true,
"url": "http://localhost:5001",
"timeout": 30,
"retry_attempts": 3
},
"ethereum": {
"enabled": true,
"rpc_url": "http://127.0.0.1:8545",
"private_key": "your_private_key_here",
"contract_address": null,
"gas_limit": 300000,
"gas_price": "auto"
},
"storage_options": {
"store_before_training": true,
"store_after_training": true,
"store_epoch_checkpoints": false
}
}
}Update your training config to include blockchain settings:
config = {
"epochs": 5,
"batch_size": 64,
"learning_rate": 0.001,
"hash_algorithm": "blake3",
"blockchain": {
"networks": ["ipfs", "ethereum"],
"ipfs": {"url": "http://localhost:5001"},
"ethereum": {
"rpc_url": "http://127.0.0.1:8545",
"private_key": "your_private_key"
}
}
}from ml_provenance.provenance.tracker import ProvenanceTracker
import json
# Load configuration
with open('configs/blockchain_config.json', 'r') as f:
config = json.load(f)
# Initialize tracker with blockchain support
provenance_tracker = ProvenanceTracker(config=config)
# Track data and model
provenance_tracker.track_data(train_data, test_data)
provenance_tracker.track_model(model)
# Store pre-training hash on blockchain
training_config = {"epochs": 5, "batch_size": 32}
before_transactions = provenance_tracker.store_merkle_on_blockchain_before_training(training_config)
# ... training process ...
# Store post-training hash on blockchain
training_results = {"accuracy": 0.95, "loss": 0.1}
after_transactions = provenance_tracker.store_merkle_on_blockchain_after_training(training_results)
# Verify blockchain provenance
verification_results = provenance_tracker.verify_blockchain_provenance()
# Save reports
provenance_tracker.save_blockchain_report()
provenance_tracker.save()# Get blockchain status
status = provenance_tracker.get_blockchain_status()
print(f"Blockchain enabled: {status['blockchain_enabled']}")
print(f"Stored hashes: {status['stored_hashes']}")
# Verify specific networks
verification = provenance_tracker.verify_blockchain_provenance()
if verification['chain_integrity']:
print("✅ Provenance chain integrity verified!")
else:
print("❌ Provenance chain integrity failed!")
# Custom blockchain configuration
custom_config = {
"blockchain": {
"networks": ["ipfs"],
"ipfs": {"url": "http://custom-ipfs-node:5001"}
}
}
provenance_tracker = ProvenanceTracker(config=custom_config)Advantages:
- Decentralized storage
- Content-addressed
- No transaction fees
- High availability
Setup:
# Install IPFS
brew install ipfs # macOS
# or download from https://ipfs.io/docs/install/
# Start IPFS daemon
ipfs daemonConfiguration:
{
"ipfs": {
"enabled": true,
"url": "http://localhost:5001",
"timeout": 30
}
}Advantages:
- Smart contract support
- Immutable blockchain
- Programmable verification
- Global consensus
Setup:
# Install Geth
brew install ethereum
# Start local dev node
bash scripts/setup_local_geth.shConfiguration:
{
"ethereum": {
"enabled": true,
"rpc_url": "http://127.0.0.1:8545",
"private_key": "your_private_key",
"contract_address": null
}
}Advantages:
- Most secure blockchain
- OP_RETURN for data storage
- Global consensus
- Long-term stability
Setup:
# Install Bitcoin Core
brew install bitcoin
# Configure bitcoin.conf
echo "rpcuser=your_username" >> ~/.bitcoin/bitcoin.conf
echo "rpcpassword=your_password" >> ~/.bitcoin/bitcoin.conf
echo "rpcallowip=127.0.0.1" >> ~/.bitcoin/bitcoin.conf
# Start Bitcoin node
bitcoindFor production use, you can deploy a smart contract to store hashes:
// SPDX-License-Identifier: MIT
pragma solidity ^0.8.0;
contract MLProvenance {
mapping(bytes32 => bool) public storedHashes;
mapping(bytes32 => uint256) public timestamps;
mapping(bytes32 => string) public metadata;
event HashStored(bytes32 indexed merkleRoot, string metadata, uint256 timestamp);
function storeHash(bytes32 merkleRoot, string memory metadataStr) public {
storedHashes[merkleRoot] = true;
timestamps[merkleRoot] = block.timestamp;
metadata[merkleRoot] = metadataStr;
emit HashStored(merkleRoot, metadataStr, block.timestamp);
}
function verifyHash(bytes32 merkleRoot) public view returns (bool) {
return storedHashes[merkleRoot];
}
function getHashInfo(bytes32 merkleRoot) public view returns (bool, uint256, string memory) {
return (storedHashes[merkleRoot], timestamps[merkleRoot], metadata[merkleRoot]);
}
}ProvenanceTracker(base_dir="artifacts", config: Optional[Dict[str, Any]] = None)Track data provenance and generate hashes.
Track model architecture and parameters.
Store Merkle root hash on blockchain before training begins.
Returns: Dictionary mapping networks to transaction IDs
Store Merkle root hash on blockchain after training completes.
Returns: Dictionary mapping networks to transaction IDs
Verify the complete provenance chain on blockchain.
Returns: Dictionary containing verification results
Save blockchain report to file.
Returns: Path to saved report
Get current blockchain status and configuration.
Returns: Dictionary containing blockchain status
BlockchainManager(config: Dict[str, Any])Store Merkle root hash on multiple blockchain networks.
Verify Merkle root hash on multiple blockchain networks.
Get transaction information from multiple blockchain networks.
Problem: ModuleNotFoundError: No module named 'git'
Solution:
pip install gitpythonProblem: Connection refused when connecting to Geth
Solution:
# Check if Geth is running
lsof -i :8545
# Start Geth if not running
bash scripts/setup_local_geth.shProblem: Connection refused when connecting to IPFS
Solution:
# Start IPFS daemon
ipfs daemonProblem: Invalid private key error
Solution:
# Extract private key from Geth dev node
echo 'eth.accounts' | geth attach http://127.0.0.1:8545
# Then extract private key from keystore fileProblem: Out of gas error on Ethereum
Solution:
{
"ethereum": {
"gas_limit": 500000,
"gas_price": "auto"
}
}Enable debug logging:
import logging
logging.basicConfig(level=logging.DEBUG)
# Or in configuration
config["blockchain"]["debug"] = True# Check Geth logs
tail -f geth_dev.log
# Check account balance
echo 'eth.getBalance(eth.accounts[0])' | geth attach http://127.0.0.1:8545# Check IPFS status
ipfs id
# Check if content is available
ipfs cat <CID>- Store sensitive data (private keys) in environment variables
- Use different configurations for development and production
- Version control your configuration templates
import os
config = {
"ethereum": {
"private_key": os.getenv("ETH_PRIVATE_KEY"),
"rpc_url": os.getenv("ETH_RPC_URL", "http://127.0.0.1:8545")
}
}try:
transactions = provenance_tracker.store_merkle_on_blockchain_before_training(config)
if transactions:
print("✅ Blockchain storage successful")
else:
print("⚠️ Blockchain storage failed")
except Exception as e:
print(f"❌ Error: {e}")
# Fallback to local storage only- Use IPFS for development (faster, no fees)
- Use Ethereum for production (immutable, verifiable)
- Cache verification results
- Batch operations when possible
- Never commit private keys to version control
- Use test networks for development
- Validate all blockchain responses
- Implement proper access controls
# Monitor blockchain status
status = provenance_tracker.get_blockchain_status()
if not status['blockchain_enabled']:
logger.warning("Blockchain tracking disabled")
# Monitor verification results
verification = provenance_tracker.verify_blockchain_provenance()
if not verification['chain_integrity']:
logger.error("Provenance chain integrity failed")- Fork the repository
- Create a feature branch
git checkout -b feature/blockchain-integration
- Make your changes
- Add tests
- Update documentation
- Submit a pull request
# Run unit tests
python -m pytest tests/
# Run integration tests
python scripts/demo_blockchain_provenance.py
# Run full training test
python src/ml_provenance/training/train.py- Follow PEP 8
- Use type hints
- Add docstrings
- Write unit tests
- Update this guide for new features
- Add examples for new functionality
- Update API documentation
- Include troubleshooting steps
For issues and questions:
- Check the troubleshooting section
- Review the API documentation
- Check existing issues on GitHub
- Create a new issue with detailed information
This developer guide covers the blockchain-enabled ML provenance system. For more information, see the other documentation files in the docs/ directory.