🌍 QCX-TERRA: TerraMind Azure ML Deployment

Complete Azure ML orchestration from Google Colab - Deploy TerraMind geospatial model in one run

This repository provides a comprehensive Google Colab notebook that orchestrates the entire Azure Machine Learning workflow for deploying IBM's TerraMind geospatial foundation model. Run everything from conversion to deployment in a single notebook execution.

🚀 Quick Start

Run in Google Colab (Recommended)

Steps:

Click the badge above to open in Google Colab
Fill in your Azure credentials in the configuration section
Run all cells sequentially
Wait for deployment to complete (~20-30 minutes)
Your model is deployed and ready for inference!

📋 What This Does

The TerraMind Azure ML Complete notebook automates the entire deployment pipeline:

🔄 Complete Workflow

┌─────────────────────────────────────────────────────────────┐
│              Google Colab Orchestration                      │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
        ┌───────────────────────────────────────┐
        │   1. Azure Authentication             │
        │   - Service Principal / Interactive   │
        │   - Workspace connection              │
        └───────────────────────────────────────┘
                            │
                            ▼
        ┌───────────────────────────────────────┐
        │   2. Environment Setup                │
        │   - Create Azure ML environment       │
        │   - Configure dependencies            │
        └───────────────────────────────────────┘
                            │
                            ▼
        ┌───────────────────────────────────────┐
        │   3. ONNX Conversion Job              │
        │   - Submit to Azure ML compute        │
        │   - Stream job logs                   │
        │   - Monitor completion                │
        └───────────────────────────────────────┘
                            │
                            ▼
        ┌───────────────────────────────────────┐
        │   4. Model Registration               │
        │   - Register with MLflow              │
        │   - Version management                │
        └───────────────────────────────────────┘
                            │
                            ▼
        ┌───────────────────────────────────────┐
        │   5. Endpoint Creation                │
        │   - Create managed online endpoint    │
        │   - Configure authentication          │
        └───────────────────────────────────────┘
                            │
                            ▼
        ┌───────────────────────────────────────┐
        │   6. Model Deployment                 │
        │   - Deploy to endpoint                │
        │   - Configure traffic routing         │
        └───────────────────────────────────────┘
                            │
                            ▼
        ┌───────────────────────────────────────┐
        │   7. Testing & Validation             │
        │   - Test with sample data             │
        │   - Verify inference                  │
        └───────────────────────────────────────┘

✨ Key Features

🔐 Flexible Authentication: Service Principal, Default Credential, or Interactive Browser
📦 Automated Environment Setup: Creates Azure ML environment with all dependencies
🔧 ONNX Conversion: Submits conversion job to Azure ML compute cluster
📊 Real-time Monitoring: Streams job logs and tracks progress
📝 Model Registration: Automatically registers model with MLflow
🚀 One-Click Deployment: Creates endpoint and deploys model
🧪 Built-in Testing: Tests deployed endpoint with sample data
📈 Comprehensive Logging: Detailed progress tracking and error handling

🎯 Use Cases

This workflow is designed for:

Geospatial Analysis: Land use and land cover classification from satellite imagery
Environmental Monitoring: Track changes in vegetation, urban areas, and natural resources
Agricultural Applications: Crop classification and monitoring
Urban Planning: Analyze urban expansion and infrastructure development
Climate Research: Study environmental changes over time
Disaster Response: Rapid assessment of affected areas using satellite imagery

📋 Prerequisites

Before running the notebook, ensure you have:

Azure Requirements

Requirement	Description
Azure Subscription	Active subscription with sufficient credits
Azure ML Workspace	Pre-created workspace in your resource group
Compute Cluster	Provisioned compute cluster for training jobs
Permissions	Contributor or Owner role on the workspace

Authentication Options

Option 1: Service Principal (Recommended for automation)

Tenant ID
Client ID
Client Secret

Option 2: Interactive Browser (Easiest for first-time users)

Azure account credentials
Browser access for authentication

Option 3: Default Credential (For Azure-hosted environments)

Managed Identity or environment variables

Getting Started with Azure

If you don't have an Azure ML workspace:

Create Azure Account: https://azure.microsoft.com/free/
Create Resource Group: In Azure Portal → Resource Groups → Create
Create ML Workspace: Azure Portal → Machine Learning → Create
Create Compute Cluster: In ML Workspace → Compute → Create

🔧 Technical Details

Model Information

Property	Value
Model	IBM TerraMind-1.0-base
Architecture	Geospatial Foundation Model
Input	Sentinel-2 L2A imagery (6 bands, 224×224)
Output	Land Use Land Cover classifications
Format	ONNX (optimized for inference)
Opset	14

Input Specifications

The model expects Sentinel-2 Level 2A imagery with the following bands:

Band	Name	Wavelength	Resolution	Description
B2	Blue	490 nm	10m	Blue visible light
B3	Green	560 nm	10m	Green visible light
B4	Red	665 nm	10m	Red visible light
B8	NIR	842 nm	10m	Near-infrared
B11	SWIR1	1610 nm	20m	Short-wave infrared 1
B12	SWIR2	2190 nm	20m	Short-wave infrared 2

Input shape: (batch_size, 6, 224, 224)
Data type: float32
Normalization: Standardized (zero mean, unit variance)

Azure ML Configuration

The notebook uses the following Azure ML components:

Environment: Custom conda environment with transformers, ONNX, MLflow
Compute: User-specified compute cluster for conversion job
Endpoint: Managed online endpoint with key-based authentication
Deployment: Configurable instance type and count

Default Configuration

ENDPOINT_NAME = "terramind-onnx-endpoint"
DEPLOYMENT_NAME = "terramind-onnx-deploy"
INSTANCE_TYPE = "Standard_DS3_v2"
INSTANCE_COUNT = 1
MODEL_NAME = "terramind-onnx-model"

📊 Usage Example

After deployment, use the endpoint for inference:

Python Example

import requests
import numpy as np
import base64
import json

# Endpoint configuration (from notebook output)
scoring_uri = "https://your-endpoint.azureml.ms/score"
api_key = "your-api-key"

# Prepare Sentinel-2 data (6 bands, 224x224)
sentinel_data = np.random.rand(1, 6, 224, 224).astype(np.float32)

# Encode as base64
tensor_b64 = base64.b64encode(sentinel_data.tobytes()).decode('utf-8')

# Create request payload
payload = {
    "input_data": {
        "columns": ["S2L2A"],
        "data": [[tensor_b64]]
    }
}

# Set headers
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {api_key}"
}

# Send request
response = requests.post(scoring_uri, json=payload, headers=headers)
predictions = response.json()

print(f"Predictions: {predictions}")

Batch Processing

# Process multiple images
batch_data = np.random.rand(8, 6, 224, 224).astype(np.float32)

for i, image in enumerate(batch_data):
    tensor_b64 = base64.b64encode(image.tobytes()).decode('utf-8')
    payload = {
        "input_data": {
            "columns": ["S2L2A"],
            "data": [[tensor_b64]]
        }
    }
    
    response = requests.post(scoring_uri, json=payload, headers=headers)
    result = response.json()
    print(f"Image {i}: {result}")

🔄 Comparison with Manual Deployment

Before (Manual Process)

The traditional approach required:

❌ Multiple separate scripts to run
❌ Manual environment configuration
❌ Separate job submission and monitoring
❌ Manual model registration
❌ Separate endpoint creation and deployment
❌ Complex error handling and debugging
❌ No integrated testing

After (This Implementation)

The new notebook provides:

✅ Single notebook execution - Run everything in one go
✅ Automated orchestration - All steps handled automatically
✅ Real-time monitoring - See progress as it happens
✅ Error handling - Clear error messages and troubleshooting tips
✅ Built-in testing - Automatic endpoint validation
✅ Comprehensive logging - Track every step
✅ Google Colab ready - No local setup required

📁 Repository Structure

QCX-TERRA/
├── TerraMind_Azure_ML_Complete.ipynb  # Main orchestration notebook
├── README.md                          # This file
├── requirements.txt                   # Python dependencies (legacy)
├── convert_to_onnx.py                 # Conversion script (legacy)
├── deploy.py                          # Deployment script (legacy)
├── deploy.bash                        # Setup script (legacy)
├── Test.py                            # Test script (legacy)
├── terramind_config.yaml              # Model configuration (legacy)
├── conda.yaml                         # Environment file (legacy)
├── Dockerfile.dockerfile              # Docker configuration (legacy)
└── finetune_job.py                    # Fine-tuning placeholder (legacy)

Note: Legacy files are kept for reference. The notebook is now the recommended approach.

⚙️ Configuration

Customize the deployment by modifying the configuration in the notebook:

class AzureConfig:
    # Azure Subscription and Workspace
    SUBSCRIPTION_ID = "<your-subscription-id>"
    RESOURCE_GROUP = "<your-resource-group>"
    WORKSPACE_NAME = "<your-workspace-name>"
    COMPUTE_NAME = "<your-compute-cluster>"
    
    # Service Principal (optional)
    TENANT_ID = ""
    CLIENT_ID = ""
    CLIENT_SECRET = ""
    
    # Deployment Configuration
    ENDPOINT_NAME = "terramind-onnx-endpoint"
    DEPLOYMENT_NAME = "terramind-onnx-deploy"
    INSTANCE_TYPE = "Standard_DS3_v2"
    INSTANCE_COUNT = 1
    
    # Model Configuration
    MODEL_ID = "ibm-esa-geospatial/TerraMind-1.0-base"
    MODEL_NAME = "terramind-onnx-model"
    ONNX_OPSET = 14

💰 Cost Considerations

Running this workflow incurs Azure costs:

Estimated Costs

Resource	Estimated Cost	Duration
Compute Cluster (conversion)	~$0.50-2.00	10-20 min
Endpoint (Standard_DS3_v2)	~$0.20/hour	Ongoing
Storage	~$0.01/GB	Ongoing

Total deployment cost: ~$2-5 (one-time)
Monthly running cost: ~$150 (if endpoint runs 24/7)

Cost Optimization Tips

Delete endpoint when not in use: Avoid ongoing charges
Use autoscaling: Scale down during low usage
Choose appropriate instance type: Don't over-provision
Monitor usage: Set up cost alerts in Azure Portal

Cleanup Resources

# Delete deployment
ml_client.online_deployments.begin_delete(
    name="terramind-onnx-deploy",
    endpoint_name="terramind-onnx-endpoint"
).result()

# Delete endpoint
ml_client.online_endpoints.begin_delete(
    name="terramind-onnx-endpoint"
).result()

🐛 Troubleshooting

Common Issues

Authentication Failed

Problem: Cannot authenticate with Azure

Solutions:

Verify subscription ID, resource group, and workspace name
Check service principal credentials
Try interactive browser authentication
Run az login if using Azure CLI

Compute Cluster Not Found

Problem: Specified compute cluster doesn't exist

Solutions:

Create compute cluster in Azure ML workspace
Verify compute cluster name spelling
Check if cluster is in the same workspace

Job Failed

Problem: ONNX conversion job fails

Solutions:

Check job logs in Azure Portal
Verify model ID is correct
Ensure sufficient compute resources
Check for quota limits

Deployment Failed

Problem: Model deployment fails

Solutions:

Verify instance type is available in your region
Check quota limits for the instance type
Ensure model was registered successfully
Review deployment logs in Azure Portal

Endpoint Not Responding

Problem: Inference requests fail

Solutions:

Wait a few minutes for endpoint to warm up
Verify scoring URI and API key
Check endpoint status in Azure Portal
Review endpoint logs for errors

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Workflow

# Fork and clone the repository
git clone https://github.com/YOUR_USERNAME/QCX-TERRA.git
cd QCX-TERRA

# Create a new branch
git checkout -b feature/your-feature-name

# Make your changes to the notebook

# Commit and push
git add .
git commit -m "Add your feature"
git push origin feature/your-feature-name

# Open a Pull Request

📝 License

This project is part of the QueueLab organization. Please refer to the organization's licensing terms.

🔗 Resources

Model & Data

TerraMind Model: HuggingFace
Sentinel-2 Data: Copernicus Open Access Hub
Terratorch Library: GitHub

Azure ML Documentation

Azure ML Overview: Documentation
Managed Endpoints: Guide
MLflow Integration: Documentation
ONNX Runtime: Documentation

Tutorials

Geospatial ML: IBM Research Blog
Sentinel-2 Processing: ESA Documentation
Azure ML SDK v2: Tutorial

📧 Contact & Support

Repository: QueueLab/QCX-TERRA
Issues: GitHub Issues
Organization: QueueLab

🎉 Acknowledgments

IBM Research for developing the TerraMind foundation model
ESA Copernicus for providing Sentinel-2 satellite imagery
HuggingFace for model hosting and distribution
Microsoft Azure for cloud infrastructure and ML platform
ONNX Runtime team for inference optimization

Made with ❤️ by QueueLab

Deploy TerraMind to Azure ML in one click!

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Dockerfile.dockerfile		Dockerfile.dockerfile
README.md		README.md
TerraMind_Azure_ML_Complete.ipynb		TerraMind_Azure_ML_Complete.ipynb
Test.py		Test.py
conda.yaml		conda.yaml
convert_to_onnx.py		convert_to_onnx.py
deploy.bash		deploy.bash
deploy.py		deploy.py
finetune_job.py		finetune_job.py
requirements.txt		requirements.txt
terramind_config.yaml		terramind_config.yaml

QueueLab/QCX-TERRA

Folders and files

Latest commit

History

Repository files navigation

🌍 QCX-TERRA: TerraMind Azure ML Deployment

🚀 Quick Start

Run in Google Colab (Recommended)

📋 What This Does

🔄 Complete Workflow

✨ Key Features

🎯 Use Cases

📋 Prerequisites

Azure Requirements

Authentication Options

Getting Started with Azure

🔧 Technical Details

Model Information

Input Specifications

Azure ML Configuration

Default Configuration

📊 Usage Example

Python Example

Batch Processing

🔄 Comparison with Manual Deployment

Before (Manual Process)

After (This Implementation)

📁 Repository Structure

⚙️ Configuration

💰 Cost Considerations

Estimated Costs

Cost Optimization Tips

Cleanup Resources

🐛 Troubleshooting

Common Issues

Authentication Failed

Compute Cluster Not Found

Job Failed

Deployment Failed

Endpoint Not Responding

🤝 Contributing

Development Workflow

📝 License

🔗 Resources

Model & Data

Azure ML Documentation

Tutorials

📧 Contact & Support

🎉 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages