Launch an AWS deep learning desktop with Amazon DCV for developing, training, testing, and visualizing deep learning, and generative AI models.
Supported AMIs:
- Ubuntu Server Pro 24.04 LTS, Version 20251212 (Default)
Supported EC2 Instance Types:
- Trainium/Inferentia: trn1, trn2, inf2
- GPU: g4, g5, g6, p3, p4, p5
- General Purpose: Selected m5, c5, r5
Key Features:
- Generative AI Inference and Training
- Amazon SageMaker AI integration
Deployment Options:
- Quick Start (Basic) — Minimal configuration. Automatically uses the default VPC and public subnets. No EC2 key pair or S3 bucket required. Recommended for most users.
- Advanced Setup — Full control over VPC, subnets, security groups, EFS, FSx for Lustre, S3, and SSH key pair. Recommended for advanced users.
Requirements:
- AWS Account with Administrator job function access
Supported AWS Regions: us-east-1, us-east-2, us-west-2, eu-west-1, eu-central-1, ap-southeast-1, ap-southeast-2, ap-northeast-1, ap-northeast-2, ap-south-1
The basic template automatically discovers the default VPC and its public subnets, uses an Auto Scaling Group for capacity resilience across Availability Zones, and uses AWS Systems Manager (SSM) Session Manager instead of SSH key pairs.
-
Select your AWS Region from the supported regions above
-
Get Your Public IP: Use AWS check ip to find your public IP address (needed for
DesktopAccessCIDRparameter, append/32to your IP) -
Clone Repository: Clone this repository to your laptop:
git clone https://github.com/aws-samples/aws-deep-learning-ami-ubuntu-dcv-desktop.git
Create a CloudFormation stack using the deep-learning-ubuntu-desktop-basic.yaml template (see Basic Template Parameters) using AWS Management Console, or if you prefer CLI, run following commands in a terminal window:
cd ~/aws-deep-learning-ami-ubuntu-dcv-desktop
bash quick-start.shImportant: The template creates IAM resources:
- Console: Check "I acknowledge that AWS CloudFormation might create IAM resources" during review
- CLI: Uses
--capabilities CAPABILITY_NAMED_IAMflag
Note: The stack waits for UserData to complete before marking as CREATE_COMPLETE. This can take 30-60 minutes depending on the instance type and software installations. Do not proceed until the stack status shows CREATE_COMPLETE.
- Wait for stack status to show
CREATE_COMPLETEin CloudFormation console - Find your desktop instance in EC2 console (tagged with your stack name)
- Select the instance and click Connect → Session Manager → Start session
- Set a password for the
ubuntuuser:sudo passwd ubuntu
- Download and install the Amazon DCV client on your laptop
- Find the public IP of your instance in the EC2 console
- Connect to
https://<public-ip>:8443using the DCV client - Login as user
ubuntuwith the password you set via SSM - Do not upgrade the OS version when prompted on first login
- Configure Software Updater to only apply security updates automatically (avoid non-security updates unless you're an advanced user)
| Parameter Name | Description |
|---|---|
| AWSUbuntuAMIType | Required. Selects the AMI type. |
| DesktopAccessCIDR | Required. Public IP CIDR range for DCV desktop access. Use AWS check ip to find your public IP address, append /32. |
| DesktopInstanceType | Required. Amazon EC2 instance type. |
| EBSOptimized | Required. Enable network optimization for EBS (default is true). |
| EbsVolumeSize | Required. Size of EBS volume in GB (default is 1000 GB). |
| EbsVolumeType | Required. EBS volume type (default is gp3). |
| Output | Description |
|---|---|
| Ami | AMI used for the desktop instance |
| VpcId | Default VPC ID discovered by the template |
| SubnetIds | Public subnet IDs used by the Auto Scaling Group |
| InstanceProfileArn | IAM instance profile ARN |
| SecurityGroupId | Desktop security group ID |
The desktop comes with several development tools pre-configured:
- Visual Studio Code - Full-featured code editor with extensions
- Kiro - AI-powered IDE for assisted development
- Claude Code CLI - Command-line interface for Claude AI
- Miniconda3 - Python environment manager at
/home/ubuntu/miniconda3 - Docker - Container runtime for inference and training workloads
- AWS CLI - Pre-configured with IAM role credentials
- JupyterLab - Interactive notebook environment
The desktop provides comprehensive inference testing frameworks for LLMs and embedding models. See Inference Testing Guide for complete documentation.
Note: Once you have successfully connected to the Deep Learning Desktop with the DCV client, perform the following steps:
- Clone the project's git repository to your home directory:
cd ~ && git clone <repository-url>- Open the cloned repository in your preferred IDE: Kiro (recommended), Visual Studio Code, or Claude Code CLI (all are pre-installed).
Supported Inference Servers:
- Triton Inference Server - NVIDIA's production inference server
- DJL Serving - Deep Java Library with LMI
- OpenAI-compatible Server - Standard OpenAI API interface
Supported Backends:
- vLLM - High-performance inference (GPU and Neuron)
- TensorRT-LLM - Optimized for NVIDIA GPUs
- Custom Python backends for embeddings
Key Features:
- Docker containers for all server/backend combinations
- Locust-based load testing with configurable concurrency
- Automatic model caching to EFS
- Hardware auto-detection (CUDA GPUs or Neuron devices)
- Performance metrics with latency, throughput, and error rates
The desktop provides four frameworks for fine-tuning LLMs with PEFT (LoRA) or full fine-tuning. See Training Testing Guide for complete documentation.
Available Frameworks:
| Framework | Key Features |
|---|---|
| NeMo 2.0 | Tensor/pipeline parallelism, Megatron-LM optimizations |
| PyTorch Lightning | Full control, flexible callbacks |
| Accelerate | Simple API, minimal code |
| Ray Train | Distributed orchestration, auto-recovery |
Common Features:
- Generalized HuggingFace dataset pipeline with flexible templates
- Multi-node, multi-GPU distributed training with FSDP
- LoRA and full fine-tuning support
- Automatic checkpoint conversion to HuggingFace format
- Comprehensive testing and evaluation scripts
- Docker containers for reproducibility
The desktop is pre-configured for Amazon SageMaker AI.
Clone SageMaker AI Examples GitHub Repository:
mkdir ~/sagemaker-ai
cd ~/sagemaker-ai
git clone -b distributed-training-pipeline https://github.com/aws/amazon-sagemaker-examples.gitInstall Python extension in Visual Code, and open the cloned amazon-sagemaker-examples repository within Visual Code.
Inference Examples:
- Navigate to:
amazon-sagemaker-examples/advanced_functionality/large-model-inference-testing/large_model_inference.ipynb - Use conda
baseenvironment as kernel - Skip to Initialize Notebook
Training Examples (requires FSx for Lustre, see Advanced Setup):
- Navigate to:
amazon-sagemaker-examples/advanced_functionality/distributed-training-pipeline/dist_training_pipeline.ipynb - Use conda
baseenvironment as kernel - Skip to Initialize Notebook
You can safely reboot, stop, and restart the desktop instance anytime. For the basic template, the instance is managed by an Auto Scaling Group. To stop the instance without it being replaced, set the ASG desired capacity to 0 in the EC2 Auto Scaling console. For the advanced template, EFS (and FSx for Lustre, if enabled) automatically remount on restart.
For distributed training workloads, launch a deep-learning cluster with EFA and Open MPI. See the EFA Cluster Guide.
Delete CloudFormation stacks from the AWS console when no longer needed.
What Gets Deleted:
- EC2 instances
- EBS root volumes
- FSx for Lustre file-systems (if enabled, advanced template only)
What Persists:
- EFS file-systems are NOT automatically deleted (advanced template only)
The advanced template provides full control over networking, storage, and security configuration. Use this if you need a custom VPC, EFS, FSx for Lustre, S3 bucket integration, or SSH access.
In addition to the common prerequisites:
-
VPC and Subnets: Create a VPC or use an existing one. If needed, create three public subnets in three different Availability Zones
-
EC2 Key Pair: Create an EC2 key pair if you don't have one (needed for
KeyNameparameter) -
S3 Bucket: Create an S3 bucket in your selected region (can be empty initially)
-
Get Your Public IP: Use AWS check ip to find your public IP address (needed for
DesktopAccessCIDRparameter) -
Clone Repository: Clone this repository to your laptop:
git clone <repository-url>
Create a CloudFormation stack using the deep-learning-ubuntu-desktop.yaml template via:
See Advanced Template Parameters for template inputs and Advanced Template Stack Outputs for outputs.
Important: The template creates IAM resources:
- Console: Check "I acknowledge that AWS CloudFormation might create IAM resources" during review
- CLI: Use
--capabilities CAPABILITY_NAMED_IAMflag
- Wait for stack status to show
CREATE_COMPLETEin CloudFormation console - Find your desktop instance in EC2 console
- Connect via SSH as user
ubuntuusing your key pair
First-time Setup:
- If you see
"Cloud init in progress! Logs: /var/log/cloud-init-output.log", disconnect and wait ~15 minutes. The desktop installs Amazon DCV server and reboots automatically. - When you see
Deep Learning Desktop is Ready!, set a password:sudo passwd ubuntu
Troubleshooting: The desktop uses EC2 user-data for automatic software installation. Check logs at /var/log/cloud-init-output.log. Most transient failures can be fixed by rebooting the instance.
Follow the same DCV client connection steps as the basic template.
S3 Access: The desktop has access to your specified S3 bucket. Verify access:
aws s3 ls your-bucket-nameNo output means the bucket is empty (normal). An error indicates access issues.
Storage Options:
- Amazon EBS: Root volume (deleted when instance terminates)
- Amazon EFS: Mounted at
/home/ubuntu/efsby default (persists after termination) - Amazon FSx for Lustre: Optional, mounted at
/home/ubuntu/fsxby default (enable viaFSxForLustreparameter)
Important: EBS volumes are deleted on termination. EFS file-systems persist.
| Parameter Name | Description |
|---|---|
| AWSUbuntuAMIType | Required. Selects the AMI type. |
| CapacityReservationId | Optional. EC2 capacity reservation ID. |
| DesktopAccessCIDR | Public IP CIDR range for desktop access. Use AWS check ip to find your public IP address. Ignored if DesktopSecurityGroupId is specified. |
| DesktopHasPublicIpAddress | Required. Specify if desktop has a public IP address. Set to "true" unless you have AWS VPN or DirectConnect enabled. |
| DesktopInstanceType | Required. Amazon EC2 instance type. |
| DesktopSecurityGroupId | Optional advanced parameter. EC2 security group for desktop. Must allow ports 22 (SSH) and 8443 (DCV) from DesktopAccessCIDR, access to EFS and FSx for Lustre, and all traffic within the security group. Leave blank to auto-create. |
| DesktopVpcId | Required. Amazon VPC id. |
| DesktopVpcSubnetId | Required. Amazon VPC subnet. Must be public with Internet Gateway (for Internet access) or private with NAT gateway. |
| EBSOptimized | Required. Enable network optimization for EBS (default is true). |
| EFSFileSystemId | Optional advanced parameter. Existing EFS file-system id with network mount target accessible from DesktopVpcSubnetId. Use with DesktopSecurityGroupId. Leave blank to create new. |
| EFSMountPath | Absolute path where EFS file-system is mounted (default is /home/ubuntu/efs). |
| EbsVolumeSize | Required. Size of EBS volume (default is 500 GB). |
| EbsVolumeType | Required. EBS volume type (default is gp3). |
| FSxCapacity | Optional. Capacity of FSx for Lustre file-system in multiples of 1200 GB (default is 1200 GB). See FSxForLustre parameter. |
| FSxForLustre | Optional. Enable FSx for Lustre file-system (disabled by default). When enabled, automatically imports data from s3://S3bucket/S3Import. See S3Bucket and S3Import parameters. |
| FSxMountPath | FSx file-system mount path (default is /home/ubuntu/fsx). |
| KeyName | Required. EC2 key pair name for SSH access. You must have the private key. |
| S3Bucket | Required. S3 bucket name for data storage. May be empty at stack creation. |
| S3Import | Optional. S3 import prefix for FSx file-system. See FSxForLustre parameter. |
| UbuntuAMIOverride | Optional advanced parameter to override the AMI. Leave blank to use default AMIs. See AWSUbuntuAMIType. |
| Output | Description |
|---|---|
| Ami | AMI used for the desktop instance |
| VpcId | VPC ID |
| KeyPairName | EC2 key pair name |
| InstanceProfileArn | IAM instance profile ARN |
| SecurityGroupId | Desktop security group ID |
| EfsId | EFS file system ID |
| EfsMountPath | EFS mount path |
| FsxId | FSx file system ID (or "disabled") |
| FsxMountName | FSx mount name (or "disabled") |
| FsxMountPath | FSx mount path (or "disabled") |
See CONTRIBUTING for more information.
This project is licensed under the MIT-0 License.