Self-Hosted GitHub Actions Runner on Kubernetes with Actions Runner Controller

Table of Contents

2 AM. The Build Queue Is 47 Minutes Deep.

I was staring at our GitHub Actions dashboard, watching the number climb. Forty-seven minutes before any developer got CI feedback. The on-call Slack channel was on fire. One of our engineers had pushed a hotfix — and it was just sitting there, waiting for a GitHub-hosted runner to free up.

We were burning roughly $800/month on Actions minutes for a 12-person team. Every release day, every big PR merge, the queue would back up. GitHub-hosted runners have no priority lane. You wait. Everyone waits.

That night I stopped procrastinating and did what I should have done three months earlier: moved our CI/CD runners into our own Kubernetes cluster using Actions Runner Controller (ARC).

Here’s what I learned — including the parts that don’t show up in the official docs.

Why GitHub-Hosted Runners Break Under Load

GitHub-hosted runners are great until your team actually grows. Here’s where they fall apart:

Concurrency limits: Free and Team plans cap concurrent jobs. You hit the ceiling on release day, when you can least afford it.
Cold starts every time: Each runner spins up fresh. Docker layer cache, pip packages, npm modules — gone. You pay to rebuild them on every run unless you bolt on external caching.
Secrets exposure: Runners execute on GitHub’s infrastructure. In healthcare or fintech, that can be a hard compliance blocker.
No internal network access: Want integration tests against your staging database? You’re looking at VPN tunnel workarounds, none of which are clean.
Cost scales badly: A team running 200+ workflow minutes per day can hit $200–$1,000/month without much effort.

Self-hosted runners fix all of this. The problem is managing them manually — spinning up VMs, rotating registration tokens, cleaning up stale runners — doesn’t scale either. That’s exactly the gap ARC fills.

What Actions Runner Controller Actually Does

Actions Runner Controller (ARC) is a Kubernetes operator that manages runner lifecycle end-to-end. It watches your GitHub repository or organization for queued jobs, spins up runner pods automatically, and tears them down when the job finishes.

No zombie runners. No manual token rotation. Every job gets a clean, isolated pod. That last part matters more than it sounds — shared, long-lived runners accumulate state, and state causes flaky tests.

ARC supports two scaling modes:

RunnerDeployment: Fixed replica count — always-on runners, predictable capacity
RunnerSet with HorizontalRunnerAutoscaler: Scale from 0 to N based on queue depth — pay only for what you use

In production, you almost always want the autoscaler.

Three Approaches Compared

Before committing to ARC, I looked at three options:

Option 1: Self-Hosted VMs (Manual)

Register a runner on an EC2 or GCP VM. Dead simple to set up, full control. But you own OS patching, token rotation, and scaling. One VM equals one concurrent job. Ten concurrent jobs means ten VMs. The cost curve goes linear fast, and it stops being manageable around runner number five.

Option 2: GitHub Actions Larger Runners

GitHub now sells 4–64 core hosted runners. Convenient, but pricey — a 16-core runner costs roughly 8x the standard per-minute rate. You’re still cold-starting, still on GitHub’s infrastructure, and still subject to queue limits. For occasional heavy workloads it makes sense; for daily builds, it’s expensive.

Option 3: ARC on Kubernetes

Runners run as pods inside your own cluster. Multiple runners pack onto a single node. Caches persist across jobs. Builds can reach internal services directly. Scaling ties to GitHub’s actual job queue. The upfront setup takes a few hours — but once it’s running, you stop thinking about it.

If you’re already on Kubernetes, this is the natural fit. The operational overhead after initial setup is close to zero.

Setting Up ARC: Step by Step

Prerequisites

A running Kubernetes cluster (EKS, GKE, k3s — all work)
kubectl configured and pointing at your cluster
Helm 3 installed
A GitHub Personal Access Token with repo and admin:org scopes — or a GitHub App (strongly preferred for production, more on this below)

Step 1: Install ARC via Helm

# Add the ARC Helm repo
helm repo add actions-runner-controller https://actions-runner-controller.github.io/actions-runner-controller
helm repo update

# Install the controller into its own namespace
helm install arc \
  --namespace actions-runner-system \
  --create-namespace \
  actions-runner-controller/actions-runner-controller \
  --set authSecret.create=true \
  --set authSecret.github_token="ghp_YOUR_PAT_HERE"

Wait for the controller pod to come up:

kubectl get pods -n actions-runner-system
# NAME                                        READY   STATUS    RESTARTS
# arc-actions-runner-controller-xxxx-yyyy     1/1     Running   0

Step 2: Create a RunnerDeployment

Start with a fixed-scale deployment. Two replicas, verify everything registers, then layer on autoscaling:

# runner-deployment.yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: my-runners
  namespace: actions-runner-system
spec:
  replicas: 2
  template:
    spec:
      repository: your-org/your-repo   # or use 'organization: your-org' for org-wide
      image: summerwind/actions-runner:latest
      resources:
        requests:
          cpu: "1"
          memory: "2Gi"
        limits:
          cpu: "2"
          memory: "4Gi"

kubectl apply -f runner-deployment.yaml

# Verify runners registered with GitHub
kubectl get runners -n actions-runner-system
# NAME               REPOSITORY            STATUS
# my-runners-xxxx    your-org/your-repo    Running

Open your GitHub repo at Settings → Actions → Runners. Two idle runners should appear within about 30 seconds.

Step 3: Add Autoscaling

Fixed replicas waste money at 2 AM. HorizontalRunnerAutoscaler watches queue depth and adjusts runner count in real time:

# runner-autoscaler.yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  name: my-runners-autoscaler
  namespace: actions-runner-system
spec:
  scaleTargetRef:
    name: my-runners
  minReplicas: 0     # scale to zero during idle hours
  maxReplicas: 10
  metrics:
    - type: PercentageRunnersBusy
      scaleUpThreshold: '0.75'
      scaleDownThreshold: '0.25'
      scaleUpFactor: '1.5'
      scaleDownFactor: '0.5'

kubectl apply -f runner-autoscaler.yaml

Step 4: Point Your Workflows at Self-Hosted Runners

One line change in your workflow file:

# .github/workflows/ci.yml
jobs:
  build:
    runs-on: self-hosted   # or use custom labels
    steps:
      - uses: actions/checkout@v4
      - name: Build and test
        run: make test

Need to route jobs to different runner pools — different resource sizes, different teams? Use labels:

# In RunnerDeployment spec:
spec:
  template:
    spec:
      labels:
        - large
        - gpu

# In workflow:
runs-on: [self-hosted, large]

Production Hardening: What the Docs Leave Out

GitHub Apps Beat PATs Every Time

PATs expire. They’re also tied to a specific user account — if that engineer leaves the company, every workflow using their token breaks at 3 AM on a Friday. Don’t do this to yourself.

Create a GitHub App instead. ARC has first-class support, and token rotation is handled automatically:

# Create a secret from your GitHub App credentials
kubectl create secret generic arc-github-app \
  --namespace actions-runner-system \
  --from-literal=github_app_id="YOUR_APP_ID" \
  --from-literal=github_app_installation_id="YOUR_INSTALL_ID" \
  --from-literal=github_app_private_key="$(cat private-key.pem)"

Docker Builds: Skip Privileged Containers

DinD (Docker-in-Docker) works, but it requires privileged containers. That’s a real attack surface inside a cluster. Use Kaniko or BuildKit as a separate pod instead — rootless, no privilege escalation required:

# Build Docker images without privileged mode
- name: Build image
  uses: int128/kaniko-action@v1
  with:
    push: true
    tags: ghcr.io/your-org/your-app:latest

Persistent Cache: The Biggest Single Win

This is where self-hosted runners genuinely outclass GitHub-hosted. Mount a PVC into your runner pods and keep caches between runs:

spec:
  template:
    spec:
      volumeMounts:
        - mountPath: /home/runner/.cache
          name: runner-cache
      volumes:
        - name: runner-cache
          persistentVolumeClaim:
            claimName: runner-cache-pvc

Docker layers, npm modules, pip packages — all warm on the next run. Our builds dropped from 14 minutes to 4 minutes after this one change. Nothing else came close to that impact.

Results After 3 Months in Production

We moved 80% of our workflows to self-hosted runners. After three months of real traffic:

Cost: GitHub Actions spend fell from ~$800/month to under $120/month — the remainder covers a handful of workflows that still need GitHub-hosted runners for external network access
Queue time: P95 wait dropped from 12 minutes on busy days to under 90 seconds
Security: Runners now reach our internal staging environment directly — no VPN tunnel hacks, no public exposure
Reliability: Two runner pods failed over three months. ARC restarted both automatically. No developer noticed either time.

The 2 AM queue crisis that kicked off this whole project? It hasn’t happened since.

If you’re already running Kubernetes and paying real GitHub Actions bills, ARC earns back the setup time within weeks. A few hours to configure. Immediate payoff on the next release day.