This repository was archived by the owner on Jul 28, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 5
Home
Matt McLoughlin edited this page Jan 23, 2025
·
6 revisions
The Microsoft Genomics team has created a method to execute the original binary files used in the Microsoft Genomics service. You should pre-create a user-assigned managed identity in a resource group, and give it Storage Blob Data Contributor access to your storage account. Then when you execute the script, it will create a new VM in a new resource group, and use that identity to access storage. When the work is done, the VM will automatically delete its own resource group.
- Defines variables (location, names, VM size, URLs, etc.).
- Generates a secure random password for the virtual machine (VM).
- Sets a cleanup function to delete the resource group if there's an error.
- Checks if the identity resource group exists; creates it if not.
- Checks if the main resource group exists; creates it if not.
- Checks if the managed identity exists; creates it if not.
- Assigns the
Contributorrole to the managed identity for the main resource group.
- Fetches the latest image for
MicrosoftWindowsServer:WindowsServer:2022-datacenter-azure-edition. - Creates a virtual machine (VM) in Azure with the specified configuration, enabling features like secure boot and vTPM.
- Assigns the managed identity to the VM.
- Constructs a PowerShell command (
msgen.ps1) to run on the VM. - Installs and configures the Azure VM Custom Script Extension to execute the PowerShell script, passing in relevant input parameters.
- Creates necessary directories for processing, like
inputs,outputs,logs, andreferences. - Downloads and installs the AzCopy utility to manage file transfers.
- Downloads genomics reference files and other necessary resources.
- Authenticates with Azure using the managed identity and downloads input files from the provided URLs.
- Downloads and extracts the
msgen-osstoolset. - Runs the genomics tool to process the input files and generate output and logs.
- Uploads the generated output files to the specified Azure Blob Storage prefix.
- Compresses and uploads log files for troubleshooting.
- Installs Azure CLI on the VM.
- Authenticates with the managed identity to delete the VM's resource group, effectively cleaning up resources.
- Prints the completion status and highlights that the VM will delete itself after the process completes.
- Resource Management: Handles Azure resource groups, VMs, and identities dynamically.
- Automation: Automates genomics workflows using Azure resources and tools.
- Cleanup: Ensures resource cleanup after task completion to avoid lingering costs.
- Pre-create a user-assigned managed identity in a separate resource group for re-use between executions.
Grant the identity theStorage Blob Data Contributorrole on your storage account. - Clone this repository.
- Execute the
run-on-azure-vm.shscript, replacing the script parameters accordingly:
./run-on-azure-vm.sh westus msgen2025 Standard_D64d_v5 msgen2025-vm msgen2025-identity msgen2025-uamidentity https://<YOUR-STORAGE-ACCOUNT>.blob.core.windows.net/inputs/1.fq.gz https://<YOUR-STORAGE-ACCOUNT>.blob.core.windows.net/inputs/2.fq.gz https://<YOUR-STORAGE-ACCOUNT>.blob.core.windows.net/outputs| Argument | Description | Default Value | Purpose |
|---|---|---|---|
AZURE_LOCATION |
Specifies the Azure region/location where resources will be created (e.g., westus, eastus). |
westus |
Determines the geographic region for the Azure resources, which can impact performance and cost. |
STEM_NAME |
A base name used as a prefix for naming resources. | msgen2025 |
Ensures consistent and identifiable naming of all resources created by the script. |
RESOURCE_GROUP |
The name of the Azure resource group where the VM and associated resources will be created. | ${STEM_NAME}-vm |
Groups Azure resources (VM, networking, storage, etc.) into a logical container for management. |
IDENTITY_RESOURCE_GROUP |
The name of the Azure resource group where the managed identity is created. | ${STEM_NAME}-identity |
Separates identity resources from other resources for organizational or security purposes. |
IDENTITY_NAME |
The name of the Azure managed identity to be created or used. | ${STEM_NAME}-identity |
Provides the managed identity for authenticating the VM and performing operations like file transfers. |
VM_SIZE |
Specifies the size of the VM to be created (e.g., CPU cores, memory). | Standard_D64d_v5 |
Customizes the VM's capacity based on workload requirements. |
INPUT_URL1 |
URL of the first input file to be downloaded and processed. | None (mandatory) | Provides the location of the primary input dataset. |
INPUT_URL2 |
URL of the second input file (optional). | None (optional) | Allows for a secondary input dataset to be downloaded and processed if available. |
OUTPUT_URL_PREFIX |
The URL prefix for uploading the output files generated by the workflow. | None (mandatory) | Specifies the destination where processed results will be stored (e.g., Azure Blob Storage). |