Skip to content

QueueLab/QCX-TERRA

Repository files navigation

🌍 QCX-TERRA: TerraMind Azure ML Deployment

Complete Azure ML orchestration from Google Colab - Deploy TerraMind geospatial model in one run

This repository provides a comprehensive Google Colab notebook that orchestrates the entire Azure Machine Learning workflow for deploying IBM's TerraMind geospatial foundation model. Run everything from conversion to deployment in a single notebook execution.


πŸš€ Quick Start

Run in Google Colab (Recommended)

Open In Colab

Steps:

  1. Click the badge above to open in Google Colab
  2. Fill in your Azure credentials in the configuration section
  3. Run all cells sequentially
  4. Wait for deployment to complete (~20-30 minutes)
  5. Your model is deployed and ready for inference!

πŸ“‹ What This Does

The TerraMind Azure ML Complete notebook automates the entire deployment pipeline:

πŸ”„ Complete Workflow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Google Colab Orchestration                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚   1. Azure Authentication             β”‚
        β”‚   - Service Principal / Interactive   β”‚
        β”‚   - Workspace connection              β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚   2. Environment Setup                β”‚
        β”‚   - Create Azure ML environment       β”‚
        β”‚   - Configure dependencies            β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚   3. ONNX Conversion Job              β”‚
        β”‚   - Submit to Azure ML compute        β”‚
        β”‚   - Stream job logs                   β”‚
        β”‚   - Monitor completion                β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚   4. Model Registration               β”‚
        β”‚   - Register with MLflow              β”‚
        β”‚   - Version management                β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚   5. Endpoint Creation                β”‚
        β”‚   - Create managed online endpoint    β”‚
        β”‚   - Configure authentication          β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚   6. Model Deployment                 β”‚
        β”‚   - Deploy to endpoint                β”‚
        β”‚   - Configure traffic routing         β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚   7. Testing & Validation             β”‚
        β”‚   - Test with sample data             β”‚
        β”‚   - Verify inference                  β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

✨ Key Features

  • πŸ” Flexible Authentication: Service Principal, Default Credential, or Interactive Browser
  • πŸ“¦ Automated Environment Setup: Creates Azure ML environment with all dependencies
  • πŸ”§ ONNX Conversion: Submits conversion job to Azure ML compute cluster
  • πŸ“Š Real-time Monitoring: Streams job logs and tracks progress
  • πŸ“ Model Registration: Automatically registers model with MLflow
  • πŸš€ One-Click Deployment: Creates endpoint and deploys model
  • πŸ§ͺ Built-in Testing: Tests deployed endpoint with sample data
  • πŸ“ˆ Comprehensive Logging: Detailed progress tracking and error handling

🎯 Use Cases

This workflow is designed for:

  • Geospatial Analysis: Land use and land cover classification from satellite imagery
  • Environmental Monitoring: Track changes in vegetation, urban areas, and natural resources
  • Agricultural Applications: Crop classification and monitoring
  • Urban Planning: Analyze urban expansion and infrastructure development
  • Climate Research: Study environmental changes over time
  • Disaster Response: Rapid assessment of affected areas using satellite imagery

πŸ“‹ Prerequisites

Before running the notebook, ensure you have:

Azure Requirements

Requirement Description
Azure Subscription Active subscription with sufficient credits
Azure ML Workspace Pre-created workspace in your resource group
Compute Cluster Provisioned compute cluster for training jobs
Permissions Contributor or Owner role on the workspace

Authentication Options

Option 1: Service Principal (Recommended for automation)

  • Tenant ID
  • Client ID
  • Client Secret

Option 2: Interactive Browser (Easiest for first-time users)

  • Azure account credentials
  • Browser access for authentication

Option 3: Default Credential (For Azure-hosted environments)

  • Managed Identity or environment variables

Getting Started with Azure

If you don't have an Azure ML workspace:

  1. Create Azure Account: https://azure.microsoft.com/free/
  2. Create Resource Group: In Azure Portal β†’ Resource Groups β†’ Create
  3. Create ML Workspace: Azure Portal β†’ Machine Learning β†’ Create
  4. Create Compute Cluster: In ML Workspace β†’ Compute β†’ Create

πŸ”§ Technical Details

Model Information

Property Value
Model IBM TerraMind-1.0-base
Architecture Geospatial Foundation Model
Input Sentinel-2 L2A imagery (6 bands, 224Γ—224)
Output Land Use Land Cover classifications
Format ONNX (optimized for inference)
Opset 14

Input Specifications

The model expects Sentinel-2 Level 2A imagery with the following bands:

Band Name Wavelength Resolution Description
B2 Blue 490 nm 10m Blue visible light
B3 Green 560 nm 10m Green visible light
B4 Red 665 nm 10m Red visible light
B8 NIR 842 nm 10m Near-infrared
B11 SWIR1 1610 nm 20m Short-wave infrared 1
B12 SWIR2 2190 nm 20m Short-wave infrared 2

Input shape: (batch_size, 6, 224, 224)
Data type: float32
Normalization: Standardized (zero mean, unit variance)

Azure ML Configuration

The notebook uses the following Azure ML components:

  • Environment: Custom conda environment with transformers, ONNX, MLflow
  • Compute: User-specified compute cluster for conversion job
  • Endpoint: Managed online endpoint with key-based authentication
  • Deployment: Configurable instance type and count

Default Configuration

ENDPOINT_NAME = "terramind-onnx-endpoint"
DEPLOYMENT_NAME = "terramind-onnx-deploy"
INSTANCE_TYPE = "Standard_DS3_v2"
INSTANCE_COUNT = 1
MODEL_NAME = "terramind-onnx-model"

πŸ“Š Usage Example

After deployment, use the endpoint for inference:

Python Example

import requests
import numpy as np
import base64
import json

# Endpoint configuration (from notebook output)
scoring_uri = "https://your-endpoint.azureml.ms/score"
api_key = "your-api-key"

# Prepare Sentinel-2 data (6 bands, 224x224)
sentinel_data = np.random.rand(1, 6, 224, 224).astype(np.float32)

# Encode as base64
tensor_b64 = base64.b64encode(sentinel_data.tobytes()).decode('utf-8')

# Create request payload
payload = {
    "input_data": {
        "columns": ["S2L2A"],
        "data": [[tensor_b64]]
    }
}

# Set headers
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {api_key}"
}

# Send request
response = requests.post(scoring_uri, json=payload, headers=headers)
predictions = response.json()

print(f"Predictions: {predictions}")

Batch Processing

# Process multiple images
batch_data = np.random.rand(8, 6, 224, 224).astype(np.float32)

for i, image in enumerate(batch_data):
    tensor_b64 = base64.b64encode(image.tobytes()).decode('utf-8')
    payload = {
        "input_data": {
            "columns": ["S2L2A"],
            "data": [[tensor_b64]]
        }
    }
    
    response = requests.post(scoring_uri, json=payload, headers=headers)
    result = response.json()
    print(f"Image {i}: {result}")

πŸ”„ Comparison with Manual Deployment

Before (Manual Process)

The traditional approach required:

  • ❌ Multiple separate scripts to run
  • ❌ Manual environment configuration
  • ❌ Separate job submission and monitoring
  • ❌ Manual model registration
  • ❌ Separate endpoint creation and deployment
  • ❌ Complex error handling and debugging
  • ❌ No integrated testing

After (This Implementation)

The new notebook provides:

  • βœ… Single notebook execution - Run everything in one go
  • βœ… Automated orchestration - All steps handled automatically
  • βœ… Real-time monitoring - See progress as it happens
  • βœ… Error handling - Clear error messages and troubleshooting tips
  • βœ… Built-in testing - Automatic endpoint validation
  • βœ… Comprehensive logging - Track every step
  • βœ… Google Colab ready - No local setup required

πŸ“ Repository Structure

QCX-TERRA/
β”œβ”€β”€ TerraMind_Azure_ML_Complete.ipynb  # Main orchestration notebook
β”œβ”€β”€ README.md                          # This file
β”œβ”€β”€ requirements.txt                   # Python dependencies (legacy)
β”œβ”€β”€ convert_to_onnx.py                 # Conversion script (legacy)
β”œβ”€β”€ deploy.py                          # Deployment script (legacy)
β”œβ”€β”€ deploy.bash                        # Setup script (legacy)
β”œβ”€β”€ Test.py                            # Test script (legacy)
β”œβ”€β”€ terramind_config.yaml              # Model configuration (legacy)
β”œβ”€β”€ conda.yaml                         # Environment file (legacy)
β”œβ”€β”€ Dockerfile.dockerfile              # Docker configuration (legacy)
└── finetune_job.py                    # Fine-tuning placeholder (legacy)

Note: Legacy files are kept for reference. The notebook is now the recommended approach.


βš™οΈ Configuration

Customize the deployment by modifying the configuration in the notebook:

class AzureConfig:
    # Azure Subscription and Workspace
    SUBSCRIPTION_ID = "<your-subscription-id>"
    RESOURCE_GROUP = "<your-resource-group>"
    WORKSPACE_NAME = "<your-workspace-name>"
    COMPUTE_NAME = "<your-compute-cluster>"
    
    # Service Principal (optional)
    TENANT_ID = ""
    CLIENT_ID = ""
    CLIENT_SECRET = ""
    
    # Deployment Configuration
    ENDPOINT_NAME = "terramind-onnx-endpoint"
    DEPLOYMENT_NAME = "terramind-onnx-deploy"
    INSTANCE_TYPE = "Standard_DS3_v2"
    INSTANCE_COUNT = 1
    
    # Model Configuration
    MODEL_ID = "ibm-esa-geospatial/TerraMind-1.0-base"
    MODEL_NAME = "terramind-onnx-model"
    ONNX_OPSET = 14

πŸ’° Cost Considerations

Running this workflow incurs Azure costs:

Estimated Costs

Resource Estimated Cost Duration
Compute Cluster (conversion) ~$0.50-2.00 10-20 min
Endpoint (Standard_DS3_v2) ~$0.20/hour Ongoing
Storage ~$0.01/GB Ongoing

Total deployment cost: ~$2-5 (one-time)
Monthly running cost: ~$150 (if endpoint runs 24/7)

Cost Optimization Tips

  1. Delete endpoint when not in use: Avoid ongoing charges
  2. Use autoscaling: Scale down during low usage
  3. Choose appropriate instance type: Don't over-provision
  4. Monitor usage: Set up cost alerts in Azure Portal

Cleanup Resources

# Delete deployment
ml_client.online_deployments.begin_delete(
    name="terramind-onnx-deploy",
    endpoint_name="terramind-onnx-endpoint"
).result()

# Delete endpoint
ml_client.online_endpoints.begin_delete(
    name="terramind-onnx-endpoint"
).result()

πŸ› Troubleshooting

Common Issues

Authentication Failed

Problem: Cannot authenticate with Azure

Solutions:

  • Verify subscription ID, resource group, and workspace name
  • Check service principal credentials
  • Try interactive browser authentication
  • Run az login if using Azure CLI

Compute Cluster Not Found

Problem: Specified compute cluster doesn't exist

Solutions:

  • Create compute cluster in Azure ML workspace
  • Verify compute cluster name spelling
  • Check if cluster is in the same workspace

Job Failed

Problem: ONNX conversion job fails

Solutions:

  • Check job logs in Azure Portal
  • Verify model ID is correct
  • Ensure sufficient compute resources
  • Check for quota limits

Deployment Failed

Problem: Model deployment fails

Solutions:

  • Verify instance type is available in your region
  • Check quota limits for the instance type
  • Ensure model was registered successfully
  • Review deployment logs in Azure Portal

Endpoint Not Responding

Problem: Inference requests fail

Solutions:

  • Wait a few minutes for endpoint to warm up
  • Verify scoring URI and API key
  • Check endpoint status in Azure Portal
  • Review endpoint logs for errors

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Workflow

# Fork and clone the repository
git clone https://github.com/YOUR_USERNAME/QCX-TERRA.git
cd QCX-TERRA

# Create a new branch
git checkout -b feature/your-feature-name

# Make your changes to the notebook

# Commit and push
git add .
git commit -m "Add your feature"
git push origin feature/your-feature-name

# Open a Pull Request

πŸ“ License

This project is part of the QueueLab organization. Please refer to the organization's licensing terms.


πŸ”— Resources

Model & Data

Azure ML Documentation

Tutorials


πŸ“§ Contact & Support


πŸŽ‰ Acknowledgments

  • IBM Research for developing the TerraMind foundation model
  • ESA Copernicus for providing Sentinel-2 satellite imagery
  • HuggingFace for model hosting and distribution
  • Microsoft Azure for cloud infrastructure and ML platform
  • ONNX Runtime team for inference optimization

Made with ❀️ by QueueLab

Open In Colab

Deploy TerraMind to Azure ML in one click!

About

Cloud Model Garden Configuration

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published