feat: add GPU-enabled self-hosted runners for vLLM recording#5297
feat: add GPU-enabled self-hosted runners for vLLM recording#5297cdoern wants to merge 7 commits intollamastack:mainfrom
Conversation
Add GitHub Actions workflow and custom actions to support recording vLLM integration tests on GPU-enabled EC2 instances with gpt-oss:20b model. Key features: - OIDC authentication for AWS (no long-lived credentials) - Multi-region/AZ fallback for high availability (9 AZs across us-east-2 and us-east-1) - Security hardened with zero permissions on test jobs - Always-cleanup guarantee to prevent orphaned instances - Support for multiple models and instance types via workflow_dispatch Components: - record-vllm-gpu-tests.yml: Main workflow with 3-job pattern (launch → test → cleanup) - launch-gpu-runner: Wrapper action for machulav/ec2-github-runner - setup-vllm-gpu: Installs vLLM with CUDA support and starts server with AWQ quantization Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Charlie Doern <cdoern@redhat.com>
Add test suite and CI matrix configuration for GPU-based vLLM testing with gpt-oss:20b model. This enables recording integration tests on GPU runners. Changes: - Add vllm-gpu-gpt-oss setup in suites.py with gpt-oss:20b model - Add gpu-vllm matrix in ci_matrix.json for base, responses, and reasoning suites Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Charlie Doern <cdoern@redhat.com>
Add comprehensive documentation for using GPU-enabled self-hosted runners. - gpu-runners.md: User guide covering quick start, architecture, troubleshooting, cost estimates, and performance tuning - AWS_SETUP_GUIDE.md: Step-by-step instructions for setting up AWS infrastructure with OIDC authentication, VPC/networking, GPU AMI creation, and GitHub configuration Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Charlie Doern <cdoern@redhat.com>
Add comprehensive implementation plan and status tracking documents for GPU runners. - IMPLEMENTATION_PLAN.md: 4-phase roadmap with tasks, time estimates, dependencies, and success metrics - IMPLEMENTATION_STATUS.md: Current status tracker with completed tasks, AWS setup requirements, and next actions Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Charlie Doern <cdoern@redhat.com>
|
( I will remove the annoying MD files once this is ready ) |
| - gpt-oss:20b | ||
| - gpt-oss:latest | ||
| - Qwen/Qwen3-0.6B | ||
| instance_type: |
There was a problem hiding this comment.
It might be a little easier to keep a mapping of instance types and fallbacks to each supported model. That way users don't need to know or configure this when setting jobs up. That way we can also take into consideration the different resource requirements of the models. For example qwen 3.5 0.8B doesn't need an L4, it can probably run on a g4dn.xlarge and do reasonably well. To make your life easier for v1 of this pr, its probably easier to support just one model for now.
| - name: Checkout code | ||
| uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 | ||
|
|
||
| - name: Select region and AZ with fallback |
There was a problem hiding this comment.
You may want to move this into a script, so that its testable. I don't think its doing what we think it is. I'm pretty sure this only ever uses us-east-2 due to the iterator being hardcoded to 0:
IFS='|' read -r REGION AZ SUBNET AMI SG <<< "${CONFIGS[0]}"| uses: machulav/ec2-github-runner@v2.3.6 | ||
| with: | ||
| mode: start | ||
| github-token: ${{ inputs.github-token }} | ||
| ec2-image-id: ${{ inputs.ec2-ami-id }} | ||
| ec2-instance-type: ${{ inputs.instance-type }} | ||
| subnet-id: ${{ inputs.subnet-id }} | ||
| security-group-id: ${{ inputs.security-group-id }} | ||
| aws-resource-tags: ${{ inputs.ec2-instance-tags }} | ||
| runner-home-dir: ${{ inputs.runner-home-dir }} | ||
| iam-role-name: ${{ inputs.iam-role-name }} |
There was a problem hiding this comment.
Ah, this version of the GH action actually allows for setting different availability zones!
Let's use this approach so that we can try different availability zones in case the first one lacks availability. Then let me know which AWS regions you need the AMI ID to exist in.
Address feedback from code review: 1. Simplify to single model (gpt-oss:20b only) - Remove model dropdown - hardcode to gpt-oss:20b - Remove instance type selection - hardcode to g6.2xlarge - Make suite selection a dropdown for better UX 2. Fix multi-AZ fallback implementation - Use machulav/ec2-github-runner's built-in availability-zones-config - Remove broken custom region selection logic (was always using CONFIGS[0]) - Simplify to us-east-2 only with 3 AZ fallback (2a, 2b, 2c) 3. Pin actions to commit hashes for security - Pin aws-actions/configure-aws-credentials@v4.0.2 to commit hash - Pin machulav/ec2-github-runner@v2.3.6 to commit hash 4. Remove emojis from workflow output - Clean up summary messages Based on feedback from @iamemilio and @courtneypacheco Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Charlie Doern <cdoern@redhat.com>
ce3f415 to
ab8da29
Compare
Fix markdownlint issues and auto-generate workflow documentation: - Add language tags to code blocks - Fix markdown formatting (spacing, headings) - Auto-generate .github/workflows/README.md Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Charlie Doern <cdoern@redhat.com>
ab8da29 to
4837466
Compare
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Charlie Doern <cdoern@redhat.com>
Summary
Add support for recording vLLM integration tests on GPU-enabled EC2 instances with the
gpt-oss:20bmodel (20B parameters). This enables testing larger models that don't fit on standard CPU runners while maintaining cost efficiency.Key Features:
Components:
.github/workflows/record-vllm-gpu-tests.yml- Main workflow with manual trigger.github/actions/launch-gpu-runner/- EC2 instance launcher (wraps machulav/ec2-github-runner).github/actions/setup-vllm-gpu/- vLLM GPU installation and server setup with AWQ quantizationvllm-gpu-gpt-osssetupAWS Setup Required
This PR includes all code but requires AWS infrastructure setup before it can be used:
AWS_SETUP_GUIDE.mdStep 1AWS_ROLE_ARN, subnet IDs, AMI IDs, etc.See
IMPLEMENTATION_STATUS.mdfor detailed checklist.Test Plan
Once AWS infrastructure is set up:
Documentation
docs/gpu-runners.md- How to trigger workflows, troubleshooting, cost estimatesAWS_SETUP_GUIDE.md- Step-by-step infrastructure setup with OIDCIMPLEMENTATION_PLAN.md- Roadmap and future optimizationsIMPLEMENTATION_STATUS.md- Current status and next stepsReferences