Complete Azure ML orchestration from Google Colab - Deploy TerraMind geospatial model in one run
This repository provides a comprehensive Google Colab notebook that orchestrates the entire Azure Machine Learning workflow for deploying IBM's TerraMind geospatial foundation model. Run everything from conversion to deployment in a single notebook execution.
Steps:
- Click the badge above to open in Google Colab
- Fill in your Azure credentials in the configuration section
- Run all cells sequentially
- Wait for deployment to complete (~20-30 minutes)
- Your model is deployed and ready for inference!
The TerraMind Azure ML Complete notebook automates the entire deployment pipeline:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Google Colab Orchestration β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββ
β 1. Azure Authentication β
β - Service Principal / Interactive β
β - Workspace connection β
βββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββ
β 2. Environment Setup β
β - Create Azure ML environment β
β - Configure dependencies β
βββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββ
β 3. ONNX Conversion Job β
β - Submit to Azure ML compute β
β - Stream job logs β
β - Monitor completion β
βββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββ
β 4. Model Registration β
β - Register with MLflow β
β - Version management β
βββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββ
β 5. Endpoint Creation β
β - Create managed online endpoint β
β - Configure authentication β
βββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββ
β 6. Model Deployment β
β - Deploy to endpoint β
β - Configure traffic routing β
βββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββ
β 7. Testing & Validation β
β - Test with sample data β
β - Verify inference β
βββββββββββββββββββββββββββββββββββββββββ
- π Flexible Authentication: Service Principal, Default Credential, or Interactive Browser
- π¦ Automated Environment Setup: Creates Azure ML environment with all dependencies
- π§ ONNX Conversion: Submits conversion job to Azure ML compute cluster
- π Real-time Monitoring: Streams job logs and tracks progress
- π Model Registration: Automatically registers model with MLflow
- π One-Click Deployment: Creates endpoint and deploys model
- π§ͺ Built-in Testing: Tests deployed endpoint with sample data
- π Comprehensive Logging: Detailed progress tracking and error handling
This workflow is designed for:
- Geospatial Analysis: Land use and land cover classification from satellite imagery
- Environmental Monitoring: Track changes in vegetation, urban areas, and natural resources
- Agricultural Applications: Crop classification and monitoring
- Urban Planning: Analyze urban expansion and infrastructure development
- Climate Research: Study environmental changes over time
- Disaster Response: Rapid assessment of affected areas using satellite imagery
Before running the notebook, ensure you have:
| Requirement | Description |
|---|---|
| Azure Subscription | Active subscription with sufficient credits |
| Azure ML Workspace | Pre-created workspace in your resource group |
| Compute Cluster | Provisioned compute cluster for training jobs |
| Permissions | Contributor or Owner role on the workspace |
Option 1: Service Principal (Recommended for automation)
- Tenant ID
- Client ID
- Client Secret
Option 2: Interactive Browser (Easiest for first-time users)
- Azure account credentials
- Browser access for authentication
Option 3: Default Credential (For Azure-hosted environments)
- Managed Identity or environment variables
If you don't have an Azure ML workspace:
- Create Azure Account: https://azure.microsoft.com/free/
- Create Resource Group: In Azure Portal β Resource Groups β Create
- Create ML Workspace: Azure Portal β Machine Learning β Create
- Create Compute Cluster: In ML Workspace β Compute β Create
| Property | Value |
|---|---|
| Model | IBM TerraMind-1.0-base |
| Architecture | Geospatial Foundation Model |
| Input | Sentinel-2 L2A imagery (6 bands, 224Γ224) |
| Output | Land Use Land Cover classifications |
| Format | ONNX (optimized for inference) |
| Opset | 14 |
The model expects Sentinel-2 Level 2A imagery with the following bands:
| Band | Name | Wavelength | Resolution | Description |
|---|---|---|---|---|
| B2 | Blue | 490 nm | 10m | Blue visible light |
| B3 | Green | 560 nm | 10m | Green visible light |
| B4 | Red | 665 nm | 10m | Red visible light |
| B8 | NIR | 842 nm | 10m | Near-infrared |
| B11 | SWIR1 | 1610 nm | 20m | Short-wave infrared 1 |
| B12 | SWIR2 | 2190 nm | 20m | Short-wave infrared 2 |
Input shape: (batch_size, 6, 224, 224)
Data type: float32
Normalization: Standardized (zero mean, unit variance)
The notebook uses the following Azure ML components:
- Environment: Custom conda environment with transformers, ONNX, MLflow
- Compute: User-specified compute cluster for conversion job
- Endpoint: Managed online endpoint with key-based authentication
- Deployment: Configurable instance type and count
ENDPOINT_NAME = "terramind-onnx-endpoint"
DEPLOYMENT_NAME = "terramind-onnx-deploy"
INSTANCE_TYPE = "Standard_DS3_v2"
INSTANCE_COUNT = 1
MODEL_NAME = "terramind-onnx-model"After deployment, use the endpoint for inference:
import requests
import numpy as np
import base64
import json
# Endpoint configuration (from notebook output)
scoring_uri = "https://your-endpoint.azureml.ms/score"
api_key = "your-api-key"
# Prepare Sentinel-2 data (6 bands, 224x224)
sentinel_data = np.random.rand(1, 6, 224, 224).astype(np.float32)
# Encode as base64
tensor_b64 = base64.b64encode(sentinel_data.tobytes()).decode('utf-8')
# Create request payload
payload = {
"input_data": {
"columns": ["S2L2A"],
"data": [[tensor_b64]]
}
}
# Set headers
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}
# Send request
response = requests.post(scoring_uri, json=payload, headers=headers)
predictions = response.json()
print(f"Predictions: {predictions}")# Process multiple images
batch_data = np.random.rand(8, 6, 224, 224).astype(np.float32)
for i, image in enumerate(batch_data):
tensor_b64 = base64.b64encode(image.tobytes()).decode('utf-8')
payload = {
"input_data": {
"columns": ["S2L2A"],
"data": [[tensor_b64]]
}
}
response = requests.post(scoring_uri, json=payload, headers=headers)
result = response.json()
print(f"Image {i}: {result}")The traditional approach required:
- β Multiple separate scripts to run
- β Manual environment configuration
- β Separate job submission and monitoring
- β Manual model registration
- β Separate endpoint creation and deployment
- β Complex error handling and debugging
- β No integrated testing
The new notebook provides:
- β Single notebook execution - Run everything in one go
- β Automated orchestration - All steps handled automatically
- β Real-time monitoring - See progress as it happens
- β Error handling - Clear error messages and troubleshooting tips
- β Built-in testing - Automatic endpoint validation
- β Comprehensive logging - Track every step
- β Google Colab ready - No local setup required
QCX-TERRA/
βββ TerraMind_Azure_ML_Complete.ipynb # Main orchestration notebook
βββ README.md # This file
βββ requirements.txt # Python dependencies (legacy)
βββ convert_to_onnx.py # Conversion script (legacy)
βββ deploy.py # Deployment script (legacy)
βββ deploy.bash # Setup script (legacy)
βββ Test.py # Test script (legacy)
βββ terramind_config.yaml # Model configuration (legacy)
βββ conda.yaml # Environment file (legacy)
βββ Dockerfile.dockerfile # Docker configuration (legacy)
βββ finetune_job.py # Fine-tuning placeholder (legacy)
Note: Legacy files are kept for reference. The notebook is now the recommended approach.
Customize the deployment by modifying the configuration in the notebook:
class AzureConfig:
# Azure Subscription and Workspace
SUBSCRIPTION_ID = "<your-subscription-id>"
RESOURCE_GROUP = "<your-resource-group>"
WORKSPACE_NAME = "<your-workspace-name>"
COMPUTE_NAME = "<your-compute-cluster>"
# Service Principal (optional)
TENANT_ID = ""
CLIENT_ID = ""
CLIENT_SECRET = ""
# Deployment Configuration
ENDPOINT_NAME = "terramind-onnx-endpoint"
DEPLOYMENT_NAME = "terramind-onnx-deploy"
INSTANCE_TYPE = "Standard_DS3_v2"
INSTANCE_COUNT = 1
# Model Configuration
MODEL_ID = "ibm-esa-geospatial/TerraMind-1.0-base"
MODEL_NAME = "terramind-onnx-model"
ONNX_OPSET = 14Running this workflow incurs Azure costs:
| Resource | Estimated Cost | Duration |
|---|---|---|
| Compute Cluster (conversion) | ~$0.50-2.00 | 10-20 min |
| Endpoint (Standard_DS3_v2) | ~$0.20/hour | Ongoing |
| Storage | ~$0.01/GB | Ongoing |
Total deployment cost: ~$2-5 (one-time)
Monthly running cost: ~$150 (if endpoint runs 24/7)
- Delete endpoint when not in use: Avoid ongoing charges
- Use autoscaling: Scale down during low usage
- Choose appropriate instance type: Don't over-provision
- Monitor usage: Set up cost alerts in Azure Portal
# Delete deployment
ml_client.online_deployments.begin_delete(
name="terramind-onnx-deploy",
endpoint_name="terramind-onnx-endpoint"
).result()
# Delete endpoint
ml_client.online_endpoints.begin_delete(
name="terramind-onnx-endpoint"
).result()Problem: Cannot authenticate with Azure
Solutions:
- Verify subscription ID, resource group, and workspace name
- Check service principal credentials
- Try interactive browser authentication
- Run
az loginif using Azure CLI
Problem: Specified compute cluster doesn't exist
Solutions:
- Create compute cluster in Azure ML workspace
- Verify compute cluster name spelling
- Check if cluster is in the same workspace
Problem: ONNX conversion job fails
Solutions:
- Check job logs in Azure Portal
- Verify model ID is correct
- Ensure sufficient compute resources
- Check for quota limits
Problem: Model deployment fails
Solutions:
- Verify instance type is available in your region
- Check quota limits for the instance type
- Ensure model was registered successfully
- Review deployment logs in Azure Portal
Problem: Inference requests fail
Solutions:
- Wait a few minutes for endpoint to warm up
- Verify scoring URI and API key
- Check endpoint status in Azure Portal
- Review endpoint logs for errors
Contributions are welcome! Please feel free to submit a Pull Request.
# Fork and clone the repository
git clone https://github.com/YOUR_USERNAME/QCX-TERRA.git
cd QCX-TERRA
# Create a new branch
git checkout -b feature/your-feature-name
# Make your changes to the notebook
# Commit and push
git add .
git commit -m "Add your feature"
git push origin feature/your-feature-name
# Open a Pull RequestThis project is part of the QueueLab organization. Please refer to the organization's licensing terms.
- TerraMind Model: HuggingFace
- Sentinel-2 Data: Copernicus Open Access Hub
- Terratorch Library: GitHub
- Azure ML Overview: Documentation
- Managed Endpoints: Guide
- MLflow Integration: Documentation
- ONNX Runtime: Documentation
- Geospatial ML: IBM Research Blog
- Sentinel-2 Processing: ESA Documentation
- Azure ML SDK v2: Tutorial
- Repository: QueueLab/QCX-TERRA
- Issues: GitHub Issues
- Organization: QueueLab
- IBM Research for developing the TerraMind foundation model
- ESA Copernicus for providing Sentinel-2 satellite imagery
- HuggingFace for model hosting and distribution
- Microsoft Azure for cloud infrastructure and ML platform
- ONNX Runtime team for inference optimization