|
| 1 | +--- |
| 2 | +feature-img: assets/img/2025-07-03/0.png |
| 3 | +layout: post |
| 4 | +subtitle: Building an MLOps CI Environment |
| 5 | +tags: |
| 6 | +- MLOps |
| 7 | +- Infra |
| 8 | +title: Setting Up Actions Runner Controller |
| 9 | +--- |
| 10 | + |
| 11 | + |
| 12 | +### Intro |
| 13 | + |
| 14 | +As I’ve been enjoying building with AI lately, the importance of a solid test environment has become even clearer. |
| 15 | + |
| 16 | +The most common approach is to build CI with GitHub Actions, but in MLOps you often need high-spec instances for CI. |
| 17 | + |
| 18 | +GitHub Actions does offer [GPU instances (Linux, 4 cores)](https://docs.github.com/ko/billing/managing-billing-for-your-products/about-billing-for-github-actions), but at $0.07 per minute as of now, they’re quite expensive to use. |
| 19 | + |
| 20 | +They’re also limited to NVIDIA T4 GPUs, which can be restrictive as model sizes keep growing. |
| 21 | + |
| 22 | +A good alternative in this situation is a self-hosted runner. |
| 23 | + |
| 24 | +As the name suggests, you configure the runner yourself and execute GitHub workflows on it. |
| 25 | + |
| 26 | +You can set this up via GitHub’s guide: [Add self-hosted runners](https://docs.github.com/ko/actions/how-tos/hosting-your-own-runners/managing-self-hosted-runners/adding-self-hosted-runners). |
| 27 | + |
| 28 | +However, this approach requires keeping the CI machine always on (online), which can be inefficient if CI/CD jobs are infrequent. |
| 29 | + |
| 30 | +This is where the Actions Runner Controller (ARC) shines as a great alternative. |
| 31 | + |
| 32 | +[Actions Runner Controller](https://github.com/actions/actions-runner-controller) is an open-source controller that lets GitHub Actions runners operate in a Kubernetes environment. |
| 33 | + |
| 34 | +With it, your Kubernetes resources are used for CI only when a GitHub Actions workflow runs. |
| 35 | + |
| 36 | + |
| 37 | +### Installing Actions Runner Controller |
| 38 | + |
| 39 | +ARC installation has two major steps. |
| 40 | +1. Create a GitHub Personal Access Token for communication/authentication with GitHub |
| 41 | +2. Install ARC via Helm and authenticate using the token |
| 42 | + |
| 43 | +#### 1. Create a GitHub Personal Access Token |
| 44 | + |
| 45 | +ARC needs authentication to interact with the GitHub API to register and manage runners. Create a Personal Access Token (PAT). |
| 46 | + |
| 47 | +- Path: Settings > Developer settings > Personal access tokens > Tokens (classic) > Generate new token |
| 48 | + |
| 49 | +When creating the Personal Access Token, select the [appropriate permissions](https://github.com/actions/actions-runner-controller/blob/master/docs/authenticating-to-the-github-api.md#deploying-using-pat-authentication). (For convenience here, grant full permissions.) |
| 50 | + |
| 51 | +> For security, use least privilege and set an expiration. |
| 52 | +
|
| 53 | +It’s generally recommended to authenticate via a GitHub App rather than PAT. |
| 54 | + |
| 55 | +Keep the PAT safe—you’ll need it when installing ARC. |
| 56 | + |
| 57 | +#### 2. Install ARC with Helm |
| 58 | + |
| 59 | +Before installing ARC, you need cert-manager. If it’s not already set up in the cluster, install it: |
| 60 | + |
| 61 | +```bash |
| 62 | +kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.8.2/cert-manager.yaml |
| 63 | +``` |
| 64 | + |
| 65 | +Now install ARC into your Kubernetes cluster using Helm. |
| 66 | + |
| 67 | +Using the Personal Access Token you created earlier, install ARC. Replace YOUR_GITHUB_TOKEN below with your PAT. |
| 68 | + |
| 69 | +```bash |
| 70 | +helm repo add actions-runner-controller https://actions-runner-controller.github.io/actions-runner-controller |
| 71 | + |
| 72 | +helm repo update |
| 73 | + |
| 74 | +helm pull actions-runner-controller/actions-runner-controller |
| 75 | + |
| 76 | +tar -zxvf actions-runner-controller-*.tgz |
| 77 | + |
| 78 | +export GITHUB_TOKEN=YOUR_GITHUB_TOKEN |
| 79 | + |
| 80 | +helm upgrade --install actions-runner-controller ./actions-runner-controller \ |
| 81 | + --namespace actions-runner-system \ |
| 82 | + --create-namespace \ |
| 83 | + --set authSecret.create=true \ |
| 84 | + --set authSecret.github_token="${GITHUB_TOKEN}" |
| 85 | +``` |
| 86 | + |
| 87 | +After installation, verify that the ARC controller is running: |
| 88 | + |
| 89 | +```bash |
| 90 | +kubectl get pods -n actions-runner-system |
| 91 | +``` |
| 92 | + |
| 93 | +You should see the controller manager pod running in the actions-runner-system namespace. |
| 94 | + |
| 95 | +ARC is now ready to communicate with GitHub. Next, define the runners that will actually execute workflows. |
| 96 | + |
| 97 | +### 3. Configure the Runner |
| 98 | + |
| 99 | +The ARC controller is installed, but there are no runners yet. Now we’ll create runner pods according to GitHub Actions jobs. |
| 100 | + |
| 101 | +We’ll use two resources: |
| 102 | +1. RunnerDeployment: Acts as a template for runner pods. It defines which container image to use, which GitHub repo to connect to, labels, etc. |
| 103 | +2. HorizontalRunnerAutoscaler (HRA): Watches the RunnerDeployment and automatically adjusts the number of replicas based on the number of queued jobs on GitHub. |
| 104 | + |
| 105 | +#### Define a RunnerDeployment |
| 106 | + |
| 107 | +First, create a file named runner-deployment.yml as below. Change spec.template.spec.repository to your GitHub repository. |
| 108 | + |
| 109 | +> Besides a repository, you can also target an organization if you have the permissions. |
| 110 | +
|
| 111 | +```yaml |
| 112 | +apiVersion: actions.summerwind.dev/v1alpha1 |
| 113 | +kind: RunnerDeployment |
| 114 | +metadata: |
| 115 | + name: example-runner-deployment |
| 116 | + namespace: actions-runner-system |
| 117 | +spec: |
| 118 | + replicas: 1 |
| 119 | + template: |
| 120 | + spec: |
| 121 | + repository: <YOUR_NAME>/<YOUR_REPO_NAME> |
| 122 | + labels: |
| 123 | + - self-hosted |
| 124 | + - arc-runner |
| 125 | +``` |
| 126 | +
|
| 127 | +With this in place, you can find your self-hosted runner under your repo’s Actions settings. |
| 128 | +
|
| 129 | +<img src="/assets/img/2025-07-03/1.png"> |
| 130 | +
|
| 131 | +After the deployment completes, you’ll see a new runner with the self-hosted and arc-runner labels under Settings > Actions > Runners in your GitHub repository. |
| 132 | +
|
| 133 | +
|
| 134 | +#### Define a HorizontalRunnerAutoscaler |
| 135 | +
|
| 136 | +Next, define an HRA to autoscale the RunnerDeployment you created above. Create hra.yml. |
| 137 | +
|
| 138 | +```yaml |
| 139 | +apiVersion: actions.summerwind.dev/v1alpha1 |
| 140 | +kind: HorizontalRunnerAutoscaler |
| 141 | +metadata: |
| 142 | + name: example-hra |
| 143 | + namespace: actions-runner-system |
| 144 | +spec: |
| 145 | + scaleTargetRef: |
| 146 | + name: example-runner-deployment |
| 147 | + minReplicas: 0 |
| 148 | + maxReplicas: 5 |
| 149 | +``` |
| 150 | +
|
| 151 | +By setting minReplicas and maxReplicas, you can scale up and down according to your resources. |
| 152 | +
|
| 153 | +You can also specify additional metrics to create pods whenever a workflow is triggered. There are several metrics available. |
| 154 | +
|
| 155 | +> When using HorizontalRunnerAutoscaler, runners are created only when needed. When the count is zero, you won’t see runners in the GitHub UI. |
| 156 | +
|
| 157 | +<img src="/assets/img/2025-07-03/2.png"> |
| 158 | +
|
| 159 | +```yaml |
| 160 | +apiVersion: actions.summerwind.dev/v1alpha1 |
| 161 | +kind: HorizontalRunnerAutoscaler |
| 162 | +metadata: |
| 163 | + name: example-hra |
| 164 | + namespace: actions-runner-system |
| 165 | +spec: |
| 166 | + scaleTargetRef: |
| 167 | + name: example-runner-deployment |
| 168 | + minReplicas: 0 |
| 169 | + maxReplicas: 5 |
| 170 | + metrics: |
| 171 | + - type: TotalNumberOfQueuedAndInProgressWorkflowRuns |
| 172 | + repositoryNames: ["<YOUR_NAME>/<YOUR_REPO_NAME>"] |
| 173 | + |
| 174 | +The above is my preferred metric—it scales up when workflow runs are queued. As shown, choose metrics as needed to get the behavior you want. |
| 175 | + |
| 176 | + |
| 177 | +### 4. Use it in a GitHub Actions workflow |
| 178 | + |
| 179 | +All set! Using the new ARC runner is simple. In your workflow file, put the labels from the RunnerDeployment in the runs-on key. |
| 180 | + |
| 181 | +Add a simple test workflow (test-arc.yml) under .github/workflows/: |
| 182 | + |
| 183 | +```yaml |
| 184 | +name: ARC Runner Test |
| 185 | +
|
| 186 | +on: |
| 187 | + push: |
| 188 | + branches: |
| 189 | + - main |
| 190 | +
|
| 191 | +jobs: |
| 192 | + test-job: |
| 193 | + runs-on: [self-hosted, arc-runner] |
| 194 | + steps: |
| 195 | + - name: Checkout code |
| 196 | + uses: actions/checkout@v3 |
| 197 | +
|
| 198 | + - name: Test |
| 199 | + run: | |
| 200 | + echo "Hello from an ARC runner!" |
| 201 | + echo "This runner is running inside a Kubernetes pod." |
| 202 | + sleep 10 |
| 203 | +``` |
| 204 | + |
| 205 | +The key part is runs-on: [self-hosted, arc-runner]. When this workflow runs, GitHub assigns the job to a runner that has both labels. ARC detects the event and, based on the HRA settings, creates a new runner pod if needed to handle the job. |
| 206 | + |
| 207 | +> With self-hosted runners, unlike GitHub-hosted runners, you may need to install some packages within the workflow. |
| 208 | + |
| 209 | +### Troubleshooting notes |
| 210 | + |
| 211 | +I often use Docker for CI/CD, and one recurring issue is DinD (Docker-in-Docker). |
| 212 | + |
| 213 | +With ARC, by default the scheduling runner container and a docker daemon container (docker) run as sidecars. |
| 214 | + |
| 215 | +To handle this, there’s a DinD-enabled runner image. |
| 216 | + |
| 217 | +In the YAML below, set image and dockerdWithinRunnerContainer to run the docker daemon inside the runner; your workflow then executes on that runner. |
| 218 | + |
| 219 | +```yaml |
| 220 | +apiVersion: actions.summerwind.dev/v1alpha1 |
| 221 | +kind: RunnerDeployment |
| 222 | +metadata: |
| 223 | + name: example-runner-deployment |
| 224 | + namespace: actions-runner-system |
| 225 | +spec: |
| 226 | + replicas: 1 |
| 227 | + template: |
| 228 | + spec: |
| 229 | + repository: <YOUR_NAME>/<YOUR_REPO_NAME> |
| 230 | + labels: |
| 231 | + - self-hosted |
| 232 | + - arc-runner |
| 233 | + image: "summerwind/actions-runner-dind:latest" |
| 234 | + dockerdWithinRunnerContainer: true |
| 235 | +``` |
| 236 | + |
| 237 | +For Docker tests that need GPUs, if you use the DinD image above on a cluster with NVIDIA Container Toolkit installed, the GPU will be recognized. |
| 238 | + |
| 239 | +In the workflow you want to run, configure as below to confirm GPUs are available even under DinD. (Be sure to check the versions of NVIDIA Container Toolkit and the NVIDIA GPU Driver plugin!) |
| 240 | + |
| 241 | +```bash |
| 242 | +# Check GPU devices |
| 243 | +ls -la /dev/nvidia* |
| 244 | +
|
| 245 | +# device library setup |
| 246 | +smi_path=$(find / -name "nvidia-smi" 2>/dev/null | head -n 1) |
| 247 | +lib_path=$(find / -name "libnvidia-ml.so" 2>/dev/null | head -n 1) |
| 248 | +lib_dir=$(dirname "$lib_path") |
| 249 | +export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(dirname "$lib_path") |
| 250 | +export NVIDIA_VISIBLE_DEVICES=all |
| 251 | +export NVIDIA_DRIVER_CAPABILITIES=compute,utility |
| 252 | +
|
| 253 | +# Mount GPU devices and libraries directly without the nvidia runtime |
| 254 | +docker run -it \ |
| 255 | + --device=/dev/nvidia0:/dev/nvidia0 \ |
| 256 | + --device=/dev/nvidiactl:/dev/nvidiactl \ |
| 257 | + --device=/dev/nvidia-uvm:/dev/nvidia-uvm \ |
| 258 | + --device=/dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools \ |
| 259 | + -v "$lib_dir:$lib_dir:ro" \ |
| 260 | + -v "$(dirname $smi_path):$(dirname $smi_path):ro" \ |
| 261 | + -e LD_LIBRARY_PATH="$LD_LIBRARY_PATH" \ |
| 262 | + -e NVIDIA_VISIBLE_DEVICES="$NVIDIA_VISIBLE_DEVICES" \ |
| 263 | + -e NVIDIA_DRIVER_CAPABILITIES="$NVIDIA_DRIVER_CAPABILITIES" \ |
| 264 | + pytorch/pytorch:2.6.0-cuda12.4-cudnn9-runtime |
| 265 | +``` |
| 266 | + |
| 267 | +### Wrapping up |
| 268 | + |
| 269 | +We walked through how to set up Actions Runner Controller in a Kubernetes environment to build a dynamically scalable self-hosted runner setup. |
| 270 | + |
| 271 | +ARC helps avoid the high costs of GitHub-hosted runners and the inefficiencies of managing your own VMs. It’s especially powerful for MLOps CI/CD environments that need GPUs or have complex dependencies. |
| 272 | + |
| 273 | +While the initial setup may feel a bit involved, once it’s in place it can significantly cut CI/CD costs and reduce operational burden. If you’re considering MLOps, it’s definitely worth a look. |
0 commit comments