PEARC25 Conference Demos

This repository contains demonstration materials presented at the PEARC25 (Practice and Experience in Advanced Research Computing) conference, held July 20-24, 2025, in Columbus, Ohio.

Overview

These demos showcase advanced GPU computing capabilities using AMD Instinct MI300X GPUs, including:

GPU Partitioning: Demonstrating AMD GPU partitioning modes (CPX/SPX) for optimized resource allocation
Kubernetes AI Inference: Deploying and scaling vLLM inference services on Kubernetes with GPU acceleration
Model Context Protocol (MCP): Integration examples using SGLang and MCP for enhanced AI tooling

Configuration Files

ansible/device-config.yaml: Kubernetes GPU Operator configuration
ansible/microk8s-full-install.yml: MicroK8s installation playbook
ansible/microk8s-uninstall.yml: MicroK8s removal playbook
k8s/device-config.yaml: Kubernetes device plugin configuration
k8s/metallb-config.yaml: MetalLB load balancer configuration
k8s/vllm-*.yaml: vLLM deployment, service, and storage configurations
sglang/docker-compose.yml: SGLang container orchestration
env.example: Example .env file

Prerequisites

Hardware Requirements

AMD Instinct GPUs
System with ROCm-compatible hardware

Software Requirements

Instinct drivers and ROCm toolkit
Docker and Docker Compose
Ansible (for automated Kubernetes setup)
Hugging Face CLI (pip install huggingface_hub)
npm to use the MCP Inspector tool (@modelcontextprotocol/inspector)
jq for JSON parsing (used in API testing examples)

1. GPU Partitioning Demo

This section demonstrates the configuration and usage of paritions with AMD Instinct MI300X GPUs. Instinct GPUs support different partitioning modes:

SPX (Single Partition X-celerator): Treats the entire GPU as a single device
CPX (Core Partitioned X-celerator): Each XCD appears as a separate logical GPU (8 GPUs per MI300X)

Usage

Validate your environment and check current partitioning:

amd-smi version
amd-smi monitor

amd-smi monitor will display 8 available GPUs.

Change to CPX partitioning mode:

sudo amd-smi set -C cpx
amd-smi monitor

amd-smi will now display 64 available GPUs.

2. Kubernetes vLLM Inference Demo

This section demonstrates deployubg and scaling vLLM (a high-performance LLM inference server) on Kubernetes with AMD GPU acceleration. The previous partitioning configuration is leveraged to support up to 64 GPU-enabled pods. This demo is based on the comprehensive three-part blog series on AI Inference Orchestration with Kubernetes on Instinct MI300X, but has been modified to use a pre-downloaded model for faster deployment.

Prerequisites

Before starting this demo, you must:

Download the required model: Download the amd/Llama-3.2-1B-Instruct-FP8-KV model and save it to /data/hf_home using the Hugging Face CLI:

# Create the directory if it doesn't exist
sudo mkdir -p /data/hf_home
sudo chown $USER:$USER /data/hf_home

# Set Hugging Face cache directory
export HF_HOME=/data/hf_home

# Download the model (this may take some time)
huggingface-cli download amd/Llama-3.2-1B-Instruct-FP8-KV

The model will be stored in /data/hf_home/hub/ and is referenced in the vllm-deployment.yaml configuration.

Configure MetalLB IP range: Review Part 2 of the blog series to determine an appropriate IP address range for your network environment. Update the metallb-config.yaml file with IP addresses that are:
- In the same subnet as your Kubernetes nodes
- Not used by DHCP or other network services
- Available for MetalLB to assign to LoadBalancer services
Example IP range configuration in metallb-config.yaml:
```
# Update this range based on your network environment
spec:
  addresses:
  - 192.168.1.200-192.168.1.210  # Adjust for your network
```

Deployment

Setup

First, edit ansible/microk8s-full-install.yml and change the node_name: line to the hostname of the server that will host the Microk8s cluster. Then, run the Ansible playbook to install a local, single-node MicroK8s cluster, the AMD GPU Operator, and prerequisites.

cd ansible
ansible-playbook microk8s-full-install.yml -i localhost
cd ..

Deployment Steps

Verify GPU availability:

kubectl get nodes -L feature.node.kubernetes.io/amd-gpu
kubectl get nodes -o custom-columns=NAME:.metadata.name,"Total GPUs:.status.capacity.amd\.com/gpu","Allocatable GPUs:.status.allocatable.amd\.com/gpu"

Create persistent storage for vLLM:

cd k8s
kubectl apply -f vllm-pvc.yaml

Deploy a single vLLM inference service:

kubectl apply -f vllm-deployment.yaml

Monitor GPU allocation:

kubectl describe nodes | tr -d '\000' | sed -n -e '/^Name/,/Roles/p' -e '/^Capacity/,/Allocatable/p' -e '/^Allocated resources/,/Events/p' | grep -e Name -e amd.com | perl -pe 's/\n//' | perl -pe 's/Name:/\n/g' | sed 's/amd.com\/gpu:\?//g' | sed '1s/^/Node Available(GPUs) Used(GPUs)/' | sed 's/$/ 0 0 0/' | awk '{print $1, $2, $3}' | column -t

The output should indicate that a GPU has been allocated to the vLLM service.

Expose the vLLM service:

kubectl apply -f vllm-service.yaml
kubectl get svc -n default

Note the external IP of the vLLM service. This will be used to send a request to the vLLM API.

Next, install and configure MetalLB load balancer:

kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/main/config/manifests/metallb-native.yaml
kubectl get pods -n metallb-system
kubectl apply -f metallb-config.yaml
kubectl get svc

Testing the API

Once deployed, test the inference API:

curl http://<EXTERNAL-IP>:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
        "model": "amd/Llama-3.2-1B-Instruct-FP8-KV",
        "prompt": "Explain quantum entanglement in simple terms",
        "max_tokens": 1024,
        "temperature": 0.5
      }' | jq .

Scaling

Scale the deployment to multiple replicas:

kubectl scale -n default deployment llama-3-2-1b --replicas=12

Send another query to the load balancer, which will now has a pool of availble vLLM servers.

curl http://<EXTERNAL-IP>:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
        "model": "amd/Llama-3.2-1B-Instruct-FP8-KV",
        "prompt": "Explain quantum entanglement in simple terms",
        "max_tokens": 1024,
        "temperature": 0.5
      }' | jq .

Clean Up

Uninstall Microk8s

cd ..
cd ansible
ansible-playbook microk8s-uninstall.yml -i localhost

Chnage back to SPX partitioning mode

cd ..
sudo amd-smi set -C spx
amd-smi monitor

3. Model Context Protocol (MCP) Demo

The MCP demo showcases integration with SGLang and the AMD SMI MCP server for enhanced AI tooling capabilities. Do not proceed with this step until you have completed the "Clean Up" step from the previous section (uninstall microk8s and set GPU to SPX partitioning.)

Prerequisites

The setup requirements depend on which type of MCP client you plan to use:

For "Bring Your Own Model" MCP clients (e.g., Roo Code, Continue, etc.): You'll need to download the moonshotai/Kimi-K2-Instruct model and run SGLang locally:

# Set Hugging Face cache directory (if not already set)
export HF_HOME=/data/hf_home

# Download the model for SGLang demo
huggingface-cli download moonshotai/Kimi-K2-Instruct

Note: You can also use a different or smaller model if preferred, but you'll need to download it yourself and update the docker-compose.yml file accordingly to reference your chosen model.

For hosted LLM services (e.g., GitHub Copilot, Claude, etc.): No model download or SGLang setup is required. You can skip directly to the MCP server testing section.

SGLang Setup (Only for "Bring Your Own Model" clients)

If you're using an MCP client that requires a local model (like Roo Code or Continue), configure the suggested environment variables. Create a new .env file based on the provided example:

cd sglang
cp env.example .env

Edit .env and add your Hugging Face API token on the HF_TOKEN line. Next, start SGLang using Docker Compose:

docker compose up -d

Run docker compose logs -f to monitor the SGlang logs. Once the logs indicate that the SGLang server has started, configure your MCP client to use the deployed model. Refer to your specific MCP client's documentation for configuration details.

MCP AMD SMI Integration

This demonstrates the mcp-amdsmi server integration.

Setup: Follow the instructions in the mcp-amdsmi repository README to download and configure the MCP server.

Important: When using hosted LLM services (GitHub Copilot, Claude, etc.), you'll typically need to start the MCP server in HTTP transport mode rather than the default stdio mode. Refer to the mcp-amdsmi documentation for specific configuration instructions.

MCP Testing Options

Option 1: Use an MCP Client

You can test the MCP integration using various MCP clients. Visit https://modelcontextprotocol.io/clients to explore available clients and choose one that suits your needs.

Option 2: Use the MCP Inspector CLI Tool

If you prefer not to use a full MCP client, you can test the functionality using the MCP Inspector CLI tool.

# Install the MCP Inspector globally
npm i @modelcontextprotocol/inspector -g

# Test various AMD SMI MCP server capabilities
# Discover available GPUs in the system
npx @modelcontextprotocol/inspector --cli mcp-amdsmi --method tools/call --tool-name get_gpu_discovery

# Get current GPU status and utilization
npx @modelcontextprotocol/inspector --cli mcp-amdsmi --method tools/call --tool-name get_gpu_status

# Monitor GPU performance metrics
npx @modelcontextprotocol/inspector --cli mcp-amdsmi --method tools/call --tool-name get_gpu_performance

# Analyze GPU memory usage patterns
npx @modelcontextprotocol/inspector --cli mcp-amdsmi --method tools/call --tool-name analyze_gpu_memory

# Monitor power consumption and thermal data
npx @modelcontextprotocol/inspector --cli mcp-amdsmi --method tools/call --tool-name monitor_power_thermal 

# Check overall GPU health status
npx @modelcontextprotocol/inspector --cli mcp-amdsmi --method tools/call --tool-name check_gpu_health

These commands demonstrate the MCP server's ability to provide comprehensive GPU monitoring and management capabilities through the standardized Model Context Protocol interface. Each command will return information about the AMD GPU's current state and performance characteristics.

Troubleshooting

Common Issues

GPU not detected: Ensure ROCm drivers are properly installed and GPUs are visible via amd-smi
Partitioning fails: Verify you have appropriate permissions for amd-smi set commands
Kubernetes pods stuck: Check GPU operator deployment and device plugin status
Service not accessible: Verify MetalLB configuration and network policies
Model not found: Ensure the required models are properly downloaded to /data/hf_home/hub/:
- amd/Llama-3.2-1B-Instruct-FP8-KV for the Kubernetes vLLM demo
- moonshotai/Kimi-K2-Instruct for the MCP/SGLang demo (or your chosen alternative model)
MetalLB IP conflicts: Verify the IP range in metallb-config.yaml doesn't conflict with existing network assignments
MCP server connection issues: Ensure the mcp-amdsmi server is properly installed and configured according to the repository instructions

Useful Commands

Display GPU information: amd-smi monitor
Kubernetes events: kubectl get events --sort-by='.lastTimestamp'
Pod logs: kubectl logs <pod-name>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PEARC25 Conference Demos

Overview

Configuration Files

Prerequisites

Hardware Requirements

Software Requirements

1. GPU Partitioning Demo

Usage

2. Kubernetes vLLM Inference Demo

Deployment

3. Model Context Protocol (MCP) Demo

Prerequisites

SGLang Setup (Only for "Bring Your Own Model" clients)

MCP AMD SMI Integration

MCP Testing Options

Troubleshooting

Common Issues

Useful Commands

References

About

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
ansible		ansible
k8s		k8s
sglang		sglang
LICENSE		LICENSE
README.md		README.md

License

AMD-melliott/PEARC25

Folders and files

Latest commit

History

Repository files navigation

PEARC25 Conference Demos

Overview

Configuration Files

Prerequisites

Hardware Requirements

Software Requirements

1. GPU Partitioning Demo

Usage

2. Kubernetes vLLM Inference Demo

Deployment

3. Model Context Protocol (MCP) Demo

Prerequisites

SGLang Setup (Only for "Bring Your Own Model" clients)

MCP AMD SMI Integration

MCP Testing Options

Troubleshooting

Common Issues

Useful Commands

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks