Skip to content

Commit 655c87f

Browse files
ppippi-devgithub-actions[bot]
authored andcommitted
chore: add English translations for PR #3
1 parent 98bbcd0 commit 655c87f

File tree

1 file changed

+273
-0
lines changed

1 file changed

+273
-0
lines changed
Lines changed: 273 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,273 @@
1+
---
2+
feature-img: assets/img/2025-07-03/0.png
3+
layout: post
4+
subtitle: Building an MLOps CI Environment
5+
tags:
6+
- MLOps
7+
- Infra
8+
title: Setting Up Actions Runner Controller
9+
---
10+
11+
12+
### Intro
13+
14+
As I’ve been enjoying building with AI lately, the importance of a solid test environment has become even clearer.
15+
16+
The most common approach is to build CI with GitHub Actions, but in MLOps you often need high-spec instances for CI.
17+
18+
GitHub Actions does offer [GPU instances (Linux, 4 cores)](https://docs.github.com/ko/billing/managing-billing-for-your-products/about-billing-for-github-actions), but at $0.07 per minute as of now, they’re quite expensive to use.
19+
20+
They’re also limited to NVIDIA T4 GPUs, which can be restrictive as model sizes keep growing.
21+
22+
A good alternative in this situation is a self-hosted runner.
23+
24+
As the name suggests, you configure the runner yourself and execute GitHub workflows on it.
25+
26+
You can set this up via GitHub’s guide: [Add self-hosted runners](https://docs.github.com/ko/actions/how-tos/hosting-your-own-runners/managing-self-hosted-runners/adding-self-hosted-runners).
27+
28+
However, this approach requires keeping the CI machine always on (online), which can be inefficient if CI/CD jobs are infrequent.
29+
30+
This is where the Actions Runner Controller (ARC) shines as a great alternative.
31+
32+
[Actions Runner Controller](https://github.com/actions/actions-runner-controller) is an open-source controller that lets GitHub Actions runners operate in a Kubernetes environment.
33+
34+
With it, your Kubernetes resources are used for CI only when a GitHub Actions workflow runs.
35+
36+
37+
### Installing Actions Runner Controller
38+
39+
ARC installation has two major steps.
40+
1. Create a GitHub Personal Access Token for communication/authentication with GitHub
41+
2. Install ARC via Helm and authenticate using the token
42+
43+
#### 1. Create a GitHub Personal Access Token
44+
45+
ARC needs authentication to interact with the GitHub API to register and manage runners. Create a Personal Access Token (PAT).
46+
47+
- Path: Settings > Developer settings > Personal access tokens > Tokens (classic) > Generate new token
48+
49+
When creating the Personal Access Token, select the [appropriate permissions](https://github.com/actions/actions-runner-controller/blob/master/docs/authenticating-to-the-github-api.md#deploying-using-pat-authentication). (For convenience here, grant full permissions.)
50+
51+
> For security, use least privilege and set an expiration.
52+
53+
It’s generally recommended to authenticate via a GitHub App rather than PAT.
54+
55+
Keep the PAT safe—you’ll need it when installing ARC.
56+
57+
#### 2. Install ARC with Helm
58+
59+
Before installing ARC, you need cert-manager. If it’s not already set up in the cluster, install it:
60+
61+
```bash
62+
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.8.2/cert-manager.yaml
63+
```
64+
65+
Now install ARC into your Kubernetes cluster using Helm.
66+
67+
Using the Personal Access Token you created earlier, install ARC. Replace YOUR_GITHUB_TOKEN below with your PAT.
68+
69+
```bash
70+
helm repo add actions-runner-controller https://actions-runner-controller.github.io/actions-runner-controller
71+
72+
helm repo update
73+
74+
helm pull actions-runner-controller/actions-runner-controller
75+
76+
tar -zxvf actions-runner-controller-*.tgz
77+
78+
export GITHUB_TOKEN=YOUR_GITHUB_TOKEN
79+
80+
helm upgrade --install actions-runner-controller ./actions-runner-controller \
81+
--namespace actions-runner-system \
82+
--create-namespace \
83+
--set authSecret.create=true \
84+
--set authSecret.github_token="${GITHUB_TOKEN}"
85+
```
86+
87+
After installation, verify that the ARC controller is running:
88+
89+
```bash
90+
kubectl get pods -n actions-runner-system
91+
```
92+
93+
You should see the controller manager pod running in the actions-runner-system namespace.
94+
95+
ARC is now ready to communicate with GitHub. Next, define the runners that will actually execute workflows.
96+
97+
### 3. Configure the Runner
98+
99+
The ARC controller is installed, but there are no runners yet. Now we’ll create runner pods according to GitHub Actions jobs.
100+
101+
We’ll use two resources:
102+
1. RunnerDeployment: Acts as a template for runner pods. It defines which container image to use, which GitHub repo to connect to, labels, etc.
103+
2. HorizontalRunnerAutoscaler (HRA): Watches the RunnerDeployment and automatically adjusts the number of replicas based on the number of queued jobs on GitHub.
104+
105+
#### Define a RunnerDeployment
106+
107+
First, create a file named runner-deployment.yml as below. Change spec.template.spec.repository to your GitHub repository.
108+
109+
> Besides a repository, you can also target an organization if you have the permissions.
110+
111+
```yaml
112+
apiVersion: actions.summerwind.dev/v1alpha1
113+
kind: RunnerDeployment
114+
metadata:
115+
name: example-runner-deployment
116+
namespace: actions-runner-system
117+
spec:
118+
replicas: 1
119+
template:
120+
spec:
121+
repository: <YOUR_NAME>/<YOUR_REPO_NAME>
122+
labels:
123+
- self-hosted
124+
- arc-runner
125+
```
126+
127+
With this in place, you can find your self-hosted runner under your repo’s Actions settings.
128+
129+
<img src="/assets/img/2025-07-03/1.png">
130+
131+
After the deployment completes, you’ll see a new runner with the self-hosted and arc-runner labels under Settings > Actions > Runners in your GitHub repository.
132+
133+
134+
#### Define a HorizontalRunnerAutoscaler
135+
136+
Next, define an HRA to autoscale the RunnerDeployment you created above. Create hra.yml.
137+
138+
```yaml
139+
apiVersion: actions.summerwind.dev/v1alpha1
140+
kind: HorizontalRunnerAutoscaler
141+
metadata:
142+
name: example-hra
143+
namespace: actions-runner-system
144+
spec:
145+
scaleTargetRef:
146+
name: example-runner-deployment
147+
minReplicas: 0
148+
maxReplicas: 5
149+
```
150+
151+
By setting minReplicas and maxReplicas, you can scale up and down according to your resources.
152+
153+
You can also specify additional metrics to create pods whenever a workflow is triggered. There are several metrics available.
154+
155+
> When using HorizontalRunnerAutoscaler, runners are created only when needed. When the count is zero, you won’t see runners in the GitHub UI.
156+
157+
<img src="/assets/img/2025-07-03/2.png">
158+
159+
```yaml
160+
apiVersion: actions.summerwind.dev/v1alpha1
161+
kind: HorizontalRunnerAutoscaler
162+
metadata:
163+
name: example-hra
164+
namespace: actions-runner-system
165+
spec:
166+
scaleTargetRef:
167+
name: example-runner-deployment
168+
minReplicas: 0
169+
maxReplicas: 5
170+
metrics:
171+
- type: TotalNumberOfQueuedAndInProgressWorkflowRuns
172+
repositoryNames: ["<YOUR_NAME>/<YOUR_REPO_NAME>"]
173+
174+
The above is my preferred metric—it scales up when workflow runs are queued. As shown, choose metrics as needed to get the behavior you want.
175+
176+
177+
### 4. Use it in a GitHub Actions workflow
178+
179+
All set! Using the new ARC runner is simple. In your workflow file, put the labels from the RunnerDeployment in the runs-on key.
180+
181+
Add a simple test workflow (test-arc.yml) under .github/workflows/:
182+
183+
```yaml
184+
name: ARC Runner Test
185+
186+
on:
187+
push:
188+
branches:
189+
- main
190+
191+
jobs:
192+
test-job:
193+
runs-on: [self-hosted, arc-runner]
194+
steps:
195+
- name: Checkout code
196+
uses: actions/checkout@v3
197+
198+
- name: Test
199+
run: |
200+
echo "Hello from an ARC runner!"
201+
echo "This runner is running inside a Kubernetes pod."
202+
sleep 10
203+
```
204+
205+
The key part is runs-on: [self-hosted, arc-runner]. When this workflow runs, GitHub assigns the job to a runner that has both labels. ARC detects the event and, based on the HRA settings, creates a new runner pod if needed to handle the job.
206+
207+
> With self-hosted runners, unlike GitHub-hosted runners, you may need to install some packages within the workflow.
208+
209+
### Troubleshooting notes
210+
211+
I often use Docker for CI/CD, and one recurring issue is DinD (Docker-in-Docker).
212+
213+
With ARC, by default the scheduling runner container and a docker daemon container (docker) run as sidecars.
214+
215+
To handle this, there’s a DinD-enabled runner image.
216+
217+
In the YAML below, set image and dockerdWithinRunnerContainer to run the docker daemon inside the runner; your workflow then executes on that runner.
218+
219+
```yaml
220+
apiVersion: actions.summerwind.dev/v1alpha1
221+
kind: RunnerDeployment
222+
metadata:
223+
name: example-runner-deployment
224+
namespace: actions-runner-system
225+
spec:
226+
replicas: 1
227+
template:
228+
spec:
229+
repository: <YOUR_NAME>/<YOUR_REPO_NAME>
230+
labels:
231+
- self-hosted
232+
- arc-runner
233+
image: "summerwind/actions-runner-dind:latest"
234+
dockerdWithinRunnerContainer: true
235+
```
236+
237+
For Docker tests that need GPUs, if you use the DinD image above on a cluster with NVIDIA Container Toolkit installed, the GPU will be recognized.
238+
239+
In the workflow you want to run, configure as below to confirm GPUs are available even under DinD. (Be sure to check the versions of NVIDIA Container Toolkit and the NVIDIA GPU Driver plugin!)
240+
241+
```bash
242+
# Check GPU devices
243+
ls -la /dev/nvidia*
244+
245+
# device library setup
246+
smi_path=$(find / -name "nvidia-smi" 2>/dev/null | head -n 1)
247+
lib_path=$(find / -name "libnvidia-ml.so" 2>/dev/null | head -n 1)
248+
lib_dir=$(dirname "$lib_path")
249+
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(dirname "$lib_path")
250+
export NVIDIA_VISIBLE_DEVICES=all
251+
export NVIDIA_DRIVER_CAPABILITIES=compute,utility
252+
253+
# Mount GPU devices and libraries directly without the nvidia runtime
254+
docker run -it \
255+
--device=/dev/nvidia0:/dev/nvidia0 \
256+
--device=/dev/nvidiactl:/dev/nvidiactl \
257+
--device=/dev/nvidia-uvm:/dev/nvidia-uvm \
258+
--device=/dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools \
259+
-v "$lib_dir:$lib_dir:ro" \
260+
-v "$(dirname $smi_path):$(dirname $smi_path):ro" \
261+
-e LD_LIBRARY_PATH="$LD_LIBRARY_PATH" \
262+
-e NVIDIA_VISIBLE_DEVICES="$NVIDIA_VISIBLE_DEVICES" \
263+
-e NVIDIA_DRIVER_CAPABILITIES="$NVIDIA_DRIVER_CAPABILITIES" \
264+
pytorch/pytorch:2.6.0-cuda12.4-cudnn9-runtime
265+
```
266+
267+
### Wrapping up
268+
269+
We walked through how to set up Actions Runner Controller in a Kubernetes environment to build a dynamically scalable self-hosted runner setup.
270+
271+
ARC helps avoid the high costs of GitHub-hosted runners and the inefficiencies of managing your own VMs. It’s especially powerful for MLOps CI/CD environments that need GPUs or have complex dependencies.
272+
273+
While the initial setup may feel a bit involved, once it’s in place it can significantly cut CI/CD costs and reduce operational burden. If you’re considering MLOps, it’s definitely worth a look.

0 commit comments

Comments
 (0)