|
| 1 | +--- |
| 2 | +feature-img: assets/img/2025-07-03/0.png |
| 3 | +layout: post |
| 4 | +subtitle: Building an MLOps CI environment |
| 5 | +tags: |
| 6 | +- MLOps |
| 7 | +- Infra |
| 8 | +title: Setting Up Actions Runner Controller |
| 9 | +--- |
| 10 | + |
| 11 | + |
| 12 | +### Intro |
| 13 | + |
| 14 | +As I’ve been enjoying AI-driven development lately, the importance of a solid test environment has really hit home. |
| 15 | + |
| 16 | +A common approach is to build CI with GitHub Actions, but in MLOps you often need high-spec instances for CI. |
| 17 | + |
| 18 | +GitHub Actions does offer [GPU instances (Linux, 4 cores)](https://docs.github.com/ko/billing/managing-billing-for-your-products/about-billing-for-github-actions), but at the time of writing they cost $0.07 per minute, which is quite expensive. |
| 19 | + |
| 20 | +They’re also Nvidia T4 GPUs, which can be limiting performance-wise as models keep growing. |
| 21 | + |
| 22 | +A good alternative in this situation is a self-hosted runner. |
| 23 | + |
| 24 | +As the name suggests, you set up the runner yourself and execute GitHub workflows on that runner. |
| 25 | + |
| 26 | +You can configure it via GitHub’s [Add self-hosted runners](https://docs.github.com/ko/actions/how-tos/hosting-your-own-runners/managing-self-hosted-runners/adding-self-hosted-runners). |
| 27 | + |
| 28 | +However, this approach requires the CI machine to always be on (online), which can be inefficient if CI/CD jobs are infrequent. |
| 29 | + |
| 30 | +That’s where the Actions Runner Controller (ARC) shines as an excellent alternative. |
| 31 | + |
| 32 | +[Actions Runner Controller](https://github.com/actions/actions-runner-controller) is an open-source controller that manages GitHub Actions runners in a Kubernetes environment. |
| 33 | + |
| 34 | +With it, you can run CI on your own Kubernetes resources only when a GitHub Actions workflow is actually executed. |
| 35 | + |
| 36 | + |
| 37 | +### Install Actions Runner Controller |
| 38 | + |
| 39 | +Installing ARC has two main steps: |
| 40 | +1. Create a GitHub Personal Access Token for communication and authentication with GitHub |
| 41 | +2. Install ARC via Helm and authenticate with the token you created |
| 42 | + |
| 43 | +#### 1. Create a GitHub Personal Access Token |
| 44 | + |
| 45 | +ARC needs to authenticate to the GitHub API to register and manage runners. Create a GitHub Personal Access Token (PAT) for this. |
| 46 | + |
| 47 | +- Path: Settings > Developer settings > Personal access tokens > Tokens (classic) > Generate new token |
| 48 | + |
| 49 | +When creating the token, choose the [appropriate permissions](https://github.com/actions/actions-runner-controller/blob/master/docs/authenticating-to-the-github-api.md#deploying-using-pat-authentication). (For convenience here, grant full permissions.) |
| 50 | + |
| 51 | +> For security, use least privilege and set an expiration date. |
| 52 | +
|
| 53 | +It appears that authenticating via a GitHub App is recommended over using a PAT. |
| 54 | + |
| 55 | +Keep the PAT safe—you’ll need it to install ARC in the next step. |
| 56 | + |
| 57 | +#### 2. Install ARC with Helm |
| 58 | + |
| 59 | +ARC requires cert-manager. If cert-manager isn’t set up in your cluster, install it: |
| 60 | + |
| 61 | +```bash |
| 62 | +kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.8.2/cert-manager.yaml |
| 63 | +``` |
| 64 | + |
| 65 | +Now install ARC into your Kubernetes cluster with Helm. |
| 66 | + |
| 67 | +Use the Personal Access Token you created earlier to install ARC. Replace YOUR_GITHUB_TOKEN below with your PAT value. |
| 68 | + |
| 69 | +```bash |
| 70 | +helm repo add actions-runner-controller https://actions-runner-controller.github.io/actions-runner-controller |
| 71 | + |
| 72 | +helm repo update |
| 73 | + |
| 74 | +helm pull actions-runner-controller/actions-runner-controller |
| 75 | + |
| 76 | +tar -zxvf actions-runner-controller-*.tgz |
| 77 | + |
| 78 | +export GITHUB_TOKEN=YOUR_GITHUB_TOKEN |
| 79 | + |
| 80 | +helm upgrade --install actions-runner-controller ./actions-runner-controller \ |
| 81 | + --namespace actions-runner-system \ |
| 82 | + --create-namespace \ |
| 83 | + --set authSecret.create=true \ |
| 84 | + --set authSecret.github_token="${GITHUB_TOKEN}" |
| 85 | +``` |
| 86 | + |
| 87 | +After installation, verify the ARC controller is running: |
| 88 | + |
| 89 | +```bash |
| 90 | +kubectl get pods -n actions-runner-system |
| 91 | +``` |
| 92 | + |
| 93 | +If the command succeeds, you should see the ARC controller manager pod running in the actions-runner-system namespace. |
| 94 | + |
| 95 | +ARC is now ready to talk to GitHub. Next, define the runner that will actually execute your workflows. |
| 96 | + |
| 97 | +### 3. Configure a Runner |
| 98 | + |
| 99 | +The ARC controller is installed, but there’s no runner yet to execute workflows. You need to create runner pods based on GitHub Actions jobs. |
| 100 | + |
| 101 | +You’ll use two resources: |
| 102 | +1. RunnerDeployment: Acts as a template for runner pods. Defines the container image, target GitHub repository, labels, etc. |
| 103 | +2. HorizontalRunnerAutoscaler (HRA): Watches the RunnerDeployment and automatically adjusts its replicas based on the number of queued jobs in GitHub. |
| 104 | + |
| 105 | +#### Define RunnerDeployment |
| 106 | + |
| 107 | +Create a file named runner-deployment.yml as below. Change spec.template.spec.repository to your own GitHub repo. |
| 108 | + |
| 109 | +> If you have permissions, you can also target an organization instead of a single repository. |
| 110 | +
|
| 111 | +```yaml |
| 112 | +apiVersion: actions.summerwind.dev/v1alpha1 |
| 113 | +kind: RunnerDeployment |
| 114 | +metadata: |
| 115 | + name: example-runner-deployment |
| 116 | + namespace: actions-runner-system |
| 117 | +spec: |
| 118 | + replicas: 1 |
| 119 | + template: |
| 120 | + spec: |
| 121 | + repository: <YOUR_NAME>/<YOUR_REPO_NAME> |
| 122 | + labels: |
| 123 | + - self-hosted |
| 124 | + - arc-runner |
| 125 | +``` |
| 126 | +
|
| 127 | +With this configured, you can check the self-hosted runner in your GitHub repo’s Actions settings. |
| 128 | +
|
| 129 | +<img src="/assets/img/2025-07-03/1.png"> |
| 130 | +
|
| 131 | +Once the deployment is up, after a short while you’ll see a new runner with labels self-hosted and arc-runner under Settings > Actions > Runners in your repository. |
| 132 | +
|
| 133 | +#### Define HorizontalRunnerAutoscaler |
| 134 | +
|
| 135 | +Next, define an HRA to autoscale the RunnerDeployment you just created. Create hra.yml: |
| 136 | +
|
| 137 | +```yaml |
| 138 | +apiVersion: actions.summerwind.dev/v1alpha1 |
| 139 | +kind: HorizontalRunnerAutoscaler |
| 140 | +metadata: |
| 141 | + name: example-hra |
| 142 | + namespace: actions-runner-system |
| 143 | +spec: |
| 144 | + scaleTargetRef: |
| 145 | + name: example-runner-deployment |
| 146 | + minReplicas: 0 |
| 147 | + maxReplicas: 5 |
| 148 | +``` |
| 149 | +
|
| 150 | +By setting minReplicas and maxReplicas, you can scale up and down based on available resources. |
| 151 | +
|
| 152 | +You can also configure additional metrics to create pods whenever there’s a workflow trigger. Many other metrics are supported. |
| 153 | +
|
| 154 | +> When using HorizontalRunnerAutoscaler, runners are created only when needed. During idle periods (when there are zero runners), you won’t see any runners in the GitHub UI. |
| 155 | +
|
| 156 | +<img src="/assets/img/2025-07-03/2.png"> |
| 157 | +
|
| 158 | +```yaml |
| 159 | +apiVersion: actions.summerwind.dev/v1alpha1 |
| 160 | +kind: HorizontalRunnerAutoscaler |
| 161 | +metadata: |
| 162 | + name: example-hra |
| 163 | + namespace: actions-runner-system |
| 164 | +spec: |
| 165 | + scaleTargetRef: |
| 166 | + name: example-runner-deployment |
| 167 | + minReplicas: 0 |
| 168 | + maxReplicas: 5 |
| 169 | + metrics: |
| 170 | + - type: TotalNumberOfQueuedAndInProgressWorkflowRuns |
| 171 | + repositoryNames: ["<YOUR_NAME>/<YOUR_REPO_NAME>"] |
| 172 | +``` |
| 173 | +
|
| 174 | +The above is my preferred metric—it scales up when workflows are queued. As shown, you can choose metrics to fit your needs and get great results. |
| 175 | +
|
| 176 | +
|
| 177 | +### 4. Use it in a GitHub Actions workflow |
| 178 | +
|
| 179 | +All set! Using the new ARC runner is simple: specify the labels you set in the RunnerDeployment under runs-on in your workflow. |
| 180 | +
|
| 181 | +Add a simple test workflow (test-arc.yml) under .github/workflows/ in your repo: |
| 182 | +
|
| 183 | +```yaml |
| 184 | +name: ARC Runner Test |
| 185 | + |
| 186 | +on: |
| 187 | + push: |
| 188 | + branches: |
| 189 | + - main |
| 190 | + |
| 191 | +jobs: |
| 192 | + test-job: |
| 193 | + runs-on: [self-hosted, arc-runner] |
| 194 | + steps: |
| 195 | + - name: Checkout code |
| 196 | + uses: actions/checkout@v3 |
| 197 | + |
| 198 | + - name: Test |
| 199 | + run: | |
| 200 | + echo "Hello from an ARC runner!" |
| 201 | + echo "This runner is running inside a Kubernetes pod." |
| 202 | + sleep 10 |
| 203 | +``` |
| 204 | +
|
| 205 | +The key part is runs-on: [self-hosted, arc-runner]. When this workflow runs, GitHub assigns the job to a runner that has both labels. ARC detects this event and, per your HRA settings, creates a new runner pod if needed to process the job. |
| 206 | +
|
| 207 | +> With self-hosted runners, unlike GitHub-hosted runners, you may need to install some packages within your workflow. |
| 208 | +
|
| 209 | +### Troubleshooting notes |
| 210 | +
|
| 211 | +For CI/CD, I often use Docker, and one recurring issue is Docker-in-Docker (DinD). |
| 212 | +
|
| 213 | +With ARC, by default the runner (scheduling) container and a docker daemon container run as sidecars. |
| 214 | +
|
| 215 | +To handle this, there’s a Docker image that supports DinD. |
| 216 | +
|
| 217 | +If you specify the image and dockerdWithinRunnerContainer as below, the Docker daemon runs inside the runner, and the workflow runs on that runner. |
| 218 | +
|
| 219 | +```yaml |
| 220 | +apiVersion: actions.summerwind.dev/v1alpha1 |
| 221 | +kind: RunnerDeployment |
| 222 | +metadata: |
| 223 | + name: example-runner-deployment |
| 224 | + namespace: actions-runner-system |
| 225 | +spec: |
| 226 | + replicas: 1 |
| 227 | + template: |
| 228 | + spec: |
| 229 | + repository: <YOUR_NAME>/<YOUR_REPO_NAME> |
| 230 | + labels: |
| 231 | + - self-hosted |
| 232 | + - arc-runner |
| 233 | + image: "summerwind/actions-runner-dind:latest" |
| 234 | + dockerdWithinRunnerContainer: true |
| 235 | +``` |
| 236 | +
|
| 237 | +For Docker tests that need GPUs, if your cluster has NVIDIA Container Toolkit installed, using the DinD image above allows the GPU to be recognized. |
| 238 | +
|
| 239 | +Configure your workflow like this to confirm GPUs work even in a DinD setup. (Make sure your NVIDIA Container Toolkit and NVIDIA GPU Driver Plugin versions are compatible!) |
| 240 | +
|
| 241 | +```bash |
| 242 | +# Check GPU devices |
| 243 | +ls -la /dev/nvidia* |
| 244 | + |
| 245 | +# Device/library setup |
| 246 | +smi_path=$(find / -name "nvidia-smi" 2>/dev/null | head -n 1) |
| 247 | +lib_path=$(find / -name "libnvidia-ml.so" 2>/dev/null | head -n 1) |
| 248 | +lib_dir=$(dirname "$lib_path") |
| 249 | +export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(dirname "$lib_path") |
| 250 | +export NVIDIA_VISIBLE_DEVICES=all |
| 251 | +export NVIDIA_DRIVER_CAPABILITIES=compute,utility |
| 252 | + |
| 253 | +# Mount GPU devices and libraries directly without the nvidia runtime |
| 254 | +docker run -it \ |
| 255 | + --device=/dev/nvidia0:/dev/nvidia0 \ |
| 256 | + --device=/dev/nvidiactl:/dev/nvidiactl \ |
| 257 | + --device=/dev/nvidia-uvm:/dev/nvidia-uvm \ |
| 258 | + --device=/dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools \ |
| 259 | + -v "$lib_dir:$lib_dir:ro" \ |
| 260 | + -v "$(dirname $smi_path):$(dirname $smi_path):ro" \ |
| 261 | + -e LD_LIBRARY_PATH="$LD_LIBRARY_PATH" \ |
| 262 | + -e NVIDIA_VISIBLE_DEVICES="$NVIDIA_VISIBLE_DEVICES" \ |
| 263 | + -e NVIDIA_DRIVER_CAPABILITIES="$NVIDIA_DRIVER_CAPABILITIES" \ |
| 264 | + pytorch/pytorch:2.6.0-cuda12.4-cudnn9-runtime |
| 265 | +``` |
| 266 | + |
| 267 | +### Wrapping up |
| 268 | + |
| 269 | +We covered how to build a dynamically scalable self-hosted runner environment by deploying Actions Runner Controller in Kubernetes. |
| 270 | + |
| 271 | +Using ARC solves both the high cost of GitHub-hosted runners and the inefficiency of managing your own VMs for runners. ARC is especially powerful when you need GPUs or have complex dependencies in an MLOps CI/CD setup. |
| 272 | + |
| 273 | +The initial setup can feel a bit involved, but once in place, it can significantly cut CI/CD costs and reduce operational burden. If you’re working on MLOps, it’s well worth considering. |
0 commit comments