AWS Deep Learning Desktop with Amazon DCV

Launch an AWS deep learning desktop with Amazon DCV for developing, training, testing, and visualizing deep learning, and generative AI models.

Overview

Supported AMIs:

Ubuntu Server Pro 24.04 LTS, Version 20251212 (Default)

Supported EC2 Instance Types:

Trainium/Inferentia: trn1, trn2, inf2
GPU: g4, g5, g6, p3, p4, p5
General Purpose: Selected m5, c5, r5

Key Features:

Generative AI Inference and Training
Amazon SageMaker AI integration

Deployment Options:

Quick Start (Basic) — Minimal configuration. Automatically uses the default VPC and public subnets. No EC2 key pair or S3 bucket required. Recommended for most users.
Advanced Setup — Full control over VPC, subnets, security groups, EFS, FSx for Lustre, S3, and SSH key pair. Recommended for advanced users.

Getting Started

Prerequisites

Requirements:

AWS Account with Administrator job function access

Supported AWS Regions: us-east-1, us-east-2, us-west-2, eu-west-1, eu-central-1, ap-southeast-1, ap-southeast-2, ap-northeast-1, ap-northeast-2, ap-south-1

Quick Start (Basic)

The basic template automatically discovers the default VPC and its public subnets, uses an Auto Scaling Group for capacity resilience across Availability Zones, and uses AWS Systems Manager (SSM) Session Manager instead of SSH key pairs.

Setup Steps

Select your AWS Region from the supported regions above
Get Your Public IP: Use AWS check ip to find your public IP address (needed for DesktopAccessCIDR parameter, append /32 to your IP)

Clone Repository: Clone this repository to your laptop:

git clone https://github.com/aws-samples/aws-deep-learning-ami-ubuntu-dcv-desktop.git

Launch the Desktop

Create a CloudFormation stack using the deep-learning-ubuntu-desktop-basic.yaml template (see Basic Template Parameters) using AWS Management Console, or if you prefer CLI, run following commands in a terminal window:

cd ~/aws-deep-learning-ami-ubuntu-dcv-desktop
bash quick-start.sh

Important: The template creates IAM resources:

Console: Check "I acknowledge that AWS CloudFormation might create IAM resources" during review
CLI: Uses --capabilities CAPABILITY_NAMED_IAM flag

Note: The stack waits for UserData to complete before marking as CREATE_COMPLETE. This can take 30-60 minutes depending on the instance type and software installations. Do not proceed until the stack status shows CREATE_COMPLETE.

Connect via SSM Session Manager

Wait for stack status to show CREATE_COMPLETE in CloudFormation console
Find your desktop instance in EC2 console (tagged with your stack name)
Select the instance and click Connect → Session Manager → Start session
Set a password for the ubuntu user:
```
sudo passwd ubuntu
```

Connect via Amazon DCV Client

Download and install the Amazon DCV client on your laptop
Find the public IP of your instance in the EC2 console
Connect to https://<public-ip>:8443 using the DCV client
Login as user ubuntu with the password you set via SSM
Do not upgrade the OS version when prompted on first login
Configure Software Updater to only apply security updates automatically (avoid non-security updates unless you're an advanced user)

Basic Template Parameters

Parameter Name	Description
AWSUbuntuAMIType	Required. Selects the AMI type.
DesktopAccessCIDR	Required. Public IP CIDR range for DCV desktop access. Use AWS check ip to find your public IP address, append `/32`.
DesktopInstanceType	Required. Amazon EC2 instance type.
EBSOptimized	Required. Enable network optimization for EBS (default is true).
EbsVolumeSize	Required. Size of EBS volume in GB (default is 1000 GB).
EbsVolumeType	Required. EBS volume type (default is gp3).

Basic Template Stack Outputs

Output	Description
Ami	AMI used for the desktop instance
VpcId	Default VPC ID discovered by the template
SubnetIds	Public subnet IDs used by the Auto Scaling Group
InstanceProfileArn	IAM instance profile ARN
SecurityGroupId	Desktop security group ID

Preinstalled Development Tools

The desktop comes with several development tools pre-configured:

Visual Studio Code - Full-featured code editor with extensions
Kiro - AI-powered IDE for assisted development
Claude Code CLI - Command-line interface for Claude AI
Miniconda3 - Python environment manager at /home/ubuntu/miniconda3
Docker - Container runtime for inference and training workloads
AWS CLI - Pre-configured with IAM role credentials
JupyterLab - Interactive notebook environment

Using the Desktop

Generative AI Inference Testing

The desktop provides comprehensive inference testing frameworks for LLMs and embedding models. See Inference Testing Guide for complete documentation.

Note: Once you have successfully connected to the Deep Learning Desktop with the DCV client, perform the following steps:

Clone the project's git repository to your home directory:

   cd ~ && git clone <repository-url>

Open the cloned repository in your preferred IDE: Kiro (recommended), Visual Studio Code, or Claude Code CLI (all are pre-installed).

Supported Inference Servers:

Triton Inference Server - NVIDIA's production inference server
DJL Serving - Deep Java Library with LMI
OpenAI-compatible Server - Standard OpenAI API interface

Supported Backends:

vLLM - High-performance inference (GPU and Neuron)
TensorRT-LLM - Optimized for NVIDIA GPUs
Custom Python backends for embeddings

Key Features:

Docker containers for all server/backend combinations
Locust-based load testing with configurable concurrency
Automatic model caching to EFS
Hardware auto-detection (CUDA GPUs or Neuron devices)
Performance metrics with latency, throughput, and error rates

Generative AI Training Testing

The desktop provides four frameworks for fine-tuning LLMs with PEFT (LoRA) or full fine-tuning. See Training Testing Guide for complete documentation.

Available Frameworks:

Framework	Key Features
NeMo 2.0	Tensor/pipeline parallelism, Megatron-LM optimizations
PyTorch Lightning	Full control, flexible callbacks
Accelerate	Simple API, minimal code
Ray Train	Distributed orchestration, auto-recovery

Common Features:

Generalized HuggingFace dataset pipeline with flexible templates
Multi-node, multi-GPU distributed training with FSDP
LoRA and full fine-tuning support
Automatic checkpoint conversion to HuggingFace format
Comprehensive testing and evaluation scripts
Docker containers for reproducibility

Amazon SageMaker AI

The desktop is pre-configured for Amazon SageMaker AI.

Clone SageMaker AI Examples GitHub Repository:

mkdir ~/sagemaker-ai
cd ~/sagemaker-ai
git clone -b distributed-training-pipeline https://github.com/aws/amazon-sagemaker-examples.git

Install Python extension in Visual Code, and open the cloned amazon-sagemaker-examples repository within Visual Code.

Inference Examples:

Navigate to: amazon-sagemaker-examples/advanced_functionality/large-model-inference-testing/large_model_inference.ipynb
Use conda base environment as kernel
Skip to Initialize Notebook

Training Examples (requires FSx for Lustre, see Advanced Setup):

Navigate to: amazon-sagemaker-examples/advanced_functionality/distributed-training-pipeline/dist_training_pipeline.ipynb
Use conda base environment as kernel
Skip to Initialize Notebook

Managing the Desktop

Stopping and Restarting

You can safely reboot, stop, and restart the desktop instance anytime. For the basic template, the instance is managed by an Auto Scaling Group. To stop the instance without it being replaced, set the ASG desired capacity to 0 in the EC2 Auto Scaling console. For the advanced template, EFS (and FSx for Lustre, if enabled) automatically remount on restart.

Distributed Training

For distributed training workloads, launch a deep-learning cluster with EFA and Open MPI. See the EFA Cluster Guide.

Deleting Resources

Delete CloudFormation stacks from the AWS console when no longer needed.

What Gets Deleted:

EC2 instances
EBS root volumes
FSx for Lustre file-systems (if enabled, advanced template only)

What Persists:

EFS file-systems are NOT automatically deleted (advanced template only)

Advanced Setup

The advanced template provides full control over networking, storage, and security configuration. Use this if you need a custom VPC, EFS, FSx for Lustre, S3 bucket integration, or SSH access.

Prerequisites (Advanced)

In addition to the common prerequisites:

VPC and Subnets: Create a VPC or use an existing one. If needed, create three public subnets in three different Availability Zones
EC2 Key Pair: Create an EC2 key pair if you don't have one (needed for KeyName parameter)
S3 Bucket: Create an S3 bucket in your selected region (can be empty initially)
Get Your Public IP: Use AWS check ip to find your public IP address (needed for DesktopAccessCIDR parameter)
Clone Repository: Clone this repository to your laptop:
```
git clone <repository-url>
```

Launch the Desktop (Advanced)

Create a CloudFormation stack using the deep-learning-ubuntu-desktop.yaml template via:

AWS Management Console, or
AWS CLI

See Advanced Template Parameters for template inputs and Advanced Template Stack Outputs for outputs.

Important: The template creates IAM resources:

Console: Check "I acknowledge that AWS CloudFormation might create IAM resources" during review
CLI: Use --capabilities CAPABILITY_NAMED_IAM flag

Connect via SSH (Advanced)

Wait for stack status to show CREATE_COMPLETE in CloudFormation console
Find your desktop instance in EC2 console
Connect via SSH as user ubuntu using your key pair

First-time Setup:

If you see "Cloud init in progress! Logs: /var/log/cloud-init-output.log", disconnect and wait ~15 minutes. The desktop installs Amazon DCV server and reboots automatically.
When you see Deep Learning Desktop is Ready!, set a password:
```
sudo passwd ubuntu
```

Troubleshooting: The desktop uses EC2 user-data for automatic software installation. Check logs at /var/log/cloud-init-output.log. Most transient failures can be fixed by rebooting the instance.

Connect via Amazon DCV Client (Advanced)

Follow the same DCV client connection steps as the basic template.

Data Storage (Advanced)

S3 Access: The desktop has access to your specified S3 bucket. Verify access:

aws s3 ls your-bucket-name

No output means the bucket is empty (normal). An error indicates access issues.

Storage Options:

Amazon EBS: Root volume (deleted when instance terminates)
Amazon EFS: Mounted at /home/ubuntu/efs by default (persists after termination)
Amazon FSx for Lustre: Optional, mounted at /home/ubuntu/fsx by default (enable via FSxForLustre parameter)

Important: EBS volumes are deleted on termination. EFS file-systems persist.

Advanced Template Parameters

Parameter Name	Description
AWSUbuntuAMIType	Required. Selects the AMI type.
CapacityReservationId	Optional. EC2 capacity reservation ID.
DesktopAccessCIDR	Public IP CIDR range for desktop access. Use AWS check ip to find your public IP address. Ignored if DesktopSecurityGroupId is specified.
DesktopHasPublicIpAddress	Required. Specify if desktop has a public IP address. Set to "true" unless you have AWS VPN or DirectConnect enabled.
DesktopInstanceType	Required. Amazon EC2 instance type.
DesktopSecurityGroupId	Optional advanced parameter. EC2 security group for desktop. Must allow ports 22 (SSH) and 8443 (DCV) from DesktopAccessCIDR, access to EFS and FSx for Lustre, and all traffic within the security group. Leave blank to auto-create.
DesktopVpcId	Required. Amazon VPC id.
DesktopVpcSubnetId	Required. Amazon VPC subnet. Must be public with Internet Gateway (for Internet access) or private with NAT gateway.
EBSOptimized	Required. Enable network optimization for EBS (default is true).
EFSFileSystemId	Optional advanced parameter. Existing EFS file-system id with network mount target accessible from DesktopVpcSubnetId. Use with DesktopSecurityGroupId. Leave blank to create new.
EFSMountPath	Absolute path where EFS file-system is mounted (default is `/home/ubuntu/efs`).
EbsVolumeSize	Required. Size of EBS volume (default is 500 GB).
EbsVolumeType	Required. EBS volume type (default is gp3).
FSxCapacity	Optional. Capacity of FSx for Lustre file-system in multiples of 1200 GB (default is 1200 GB). See FSxForLustre parameter.
FSxForLustre	Optional. Enable FSx for Lustre file-system (disabled by default). When enabled, automatically imports data from `s3://S3bucket/S3Import`. See S3Bucket and S3Import parameters.
FSxMountPath	FSx file-system mount path (default is `/home/ubuntu/fsx`).
KeyName	Required. EC2 key pair name for SSH access. You must have the private key.
S3Bucket	Required. S3 bucket name for data storage. May be empty at stack creation.
S3Import	Optional. S3 import prefix for FSx file-system. See FSxForLustre parameter.
UbuntuAMIOverride	Optional advanced parameter to override the AMI. Leave blank to use default AMIs. See AWSUbuntuAMIType.

Advanced Template Stack Outputs

Output	Description
Ami	AMI used for the desktop instance
VpcId	VPC ID
KeyPairName	EC2 key pair name
InstanceProfileArn	IAM instance profile ARN
SecurityGroupId	Desktop security group ID
EfsId	EFS file system ID
EfsMountPath	EFS mount path
FsxId	FSx file system ID (or "disabled")
FsxMountName	FSx mount name (or "disabled")
FsxMountPath	FSx mount path (or "disabled")

Security

See CONTRIBUTING for more information.

License

This project is licensed under the MIT-0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 218 Commits
efa-cluster		efa-cluster
gen-ai-inference-testing		gen-ai-inference-testing
gen-ai-training-testing		gen-ai-training-testing
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
deep-learning-ubuntu-desktop-basic.yaml		deep-learning-ubuntu-desktop-basic.yaml
deep-learning-ubuntu-desktop.yaml		deep-learning-ubuntu-desktop.yaml
quick-start.sh		quick-start.sh

Folders and files

Latest commit

History

Repository files navigation

AWS Deep Learning Desktop with Amazon DCV

Overview

Getting Started

Prerequisites

Quick Start (Basic)

Setup Steps

Launch the Desktop

Connect via SSM Session Manager

Connect via Amazon DCV Client

Basic Template Parameters

Basic Template Stack Outputs

Preinstalled Development Tools

Using the Desktop

Generative AI Inference Testing

Generative AI Training Testing

Amazon SageMaker AI

Managing the Desktop

Stopping and Restarting

Distributed Training

Deleting Resources

Advanced Setup

Prerequisites (Advanced)

Launch the Desktop (Advanced)

Connect via SSH (Advanced)

Connect via Amazon DCV Client (Advanced)

Data Storage (Advanced)

Advanced Template Parameters

Advanced Template Stack Outputs

Security

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages