Progressive Delivery on Kubernetes with Argo Rollouts: Canary and Blue/Green Deployment Guide

DevOps tutorial - IT technology blog
DevOps tutorial - IT technology blog

Quick Start — Get Argo Rollouts Running in 5 Minutes

It’s 2 AM. Your team just pushed a hotfix and the standard kubectl rollout strategy is giving you cold sweats — one bad pod and traffic crashes. I’ve been there. That’s exactly the night I started taking Argo Rollouts seriously.

Argo Rollouts is a Kubernetes controller that lets you control exactly how traffic shifts during a release: canary splitting, blue/green switching, automated metric analysis, and rollback on failure. Standard Kubernetes Deployment objects can’t do any of this natively.

Install the controller and the kubectl plugin first:

# Install the Argo Rollouts controller
kubectl create namespace argo-rollouts
kubectl apply -n argo-rollouts \
  -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml

# Install the kubectl plugin (macOS/Linux)
curl -LO https://github.com/argoproj/argo-rollouts/releases/latest/download/kubectl-argo-rollouts-linux-amd64
chmod +x kubectl-argo-rollouts-linux-amd64
sudo mv kubectl-argo-rollouts-linux-amd64 /usr/local/bin/kubectl-argo-rollouts

# Verify
kubectl argo rollouts version

Next, swap your Deployment manifest for a Rollout object. The pod spec stays identical — you’re mostly changing the kind and adding a strategy block:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: my-app
spec:
  replicas: 5
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app
        image: my-registry/my-app:v1.0.0
        ports:
        - containerPort: 8080
  strategy:
    canary:
      steps:
      - setWeight: 20
      - pause: {duration: 2m}
      - setWeight: 50
      - pause: {duration: 5m}
      - setWeight: 100

Apply it:

kubectl apply -f rollout.yaml
kubectl argo rollouts get rollout my-app --watch

You now have a working canary rollout. 20% of traffic hits the new version, waits 2 minutes, scales to 50%, waits 5 minutes, then goes full. All without writing a single line of custom logic.

Deep Dive — How Argo Rollouts Actually Works

The Canary Strategy

Under the hood, Argo Rollouts manages two ReplicaSets: the stable set (current version) and the canary set (new version). Traffic weighting works by adjusting replica counts — at 20% weight with 5 total replicas, you get 1 canary pod and 4 stable pods.

For actual HTTP-level traffic splitting (not just replica-based), you need an ingress controller or service mesh integration. With NGINX Ingress:

strategy:
  canary:
    canaryService: my-app-canary
    stableService: my-app-stable
    trafficRouting:
      nginx:
        stableIngress: my-app-ingress
    steps:
    - setWeight: 10
    - pause: {duration: 10m}
    - setWeight: 30
    - pause: {duration: 10m}
    - setWeight: 100

Create the two services Argo Rollouts will manage:

---
apiVersion: v1
kind: Service
metadata:
  name: my-app-stable
spec:
  selector:
    app: my-app
  ports:
  - port: 80
    targetPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: my-app-canary
spec:
  selector:
    app: my-app
  ports:
  - port: 80
    targetPort: 8080

Argo Rollouts injects NGINX annotations automatically to split traffic at the proxy level — not at the pod count level. This is the difference between “roughly 10%” and “exactly 10%”.

The Blue/Green Strategy

Blue/green is conceptually simpler: run the new version (green) alongside the old (blue), then flip the switch. No gradual traffic ramp — it’s all-or-nothing, but you get to validate green before cutting over.

strategy:
  blueGreen:
    activeService: my-app-active
    previewService: my-app-preview
    autoPromotionEnabled: false   # Require manual promotion
    scaleDownDelaySeconds: 30     # Keep blue running 30s after promotion

After deploying a new image, the green pods spin up behind my-app-preview. Your QA team can hit the preview endpoint directly to validate. When you’re confident:

# Promote green to active (flip the switch)
kubectl argo rollouts promote my-app

# Or abort and roll back to blue
kubectl argo rollouts abort my-app

Pay attention to scaleDownDelaySeconds at 2 AM. It keeps the old pods alive briefly after promotion — long enough for in-flight requests to drain. That buffer also gives you time to spot an immediate problem and abort before the blue pods disappear.

Advanced Usage — Automated Analysis and Rollback

Manual promotion works fine for small teams. When you’re shipping multiple times a day, though, you need the process to check metrics for you. Argo Rollouts has an AnalysisTemplate that can query Prometheus, Datadog, New Relic, or any HTTP endpoint and automatically decide whether to proceed or roll back.

Define an analysis that checks your error rate:

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: error-rate-check
spec:
  args:
  - name: service-name
  metrics:
  - name: error-rate
    interval: 1m
    count: 5
    successCondition: result[0] < 0.05   # Less than 5% errors
    failureLimit: 2
    provider:
      prometheus:
        address: http://prometheus.monitoring.svc:9090
        query: |
          sum(rate(http_requests_total{service="{{args.service-name}}",status=~"5.."}[2m]))
          /
          sum(rate(http_requests_total{service="{{args.service-name}}"}[2m]))

Wire it into your canary strategy:

strategy:
  canary:
    steps:
    - setWeight: 20
    - analysis:
        templates:
        - templateName: error-rate-check
        args:
        - name: service-name
          value: my-app-canary
    - setWeight: 50
    - pause: {duration: 5m}
    - setWeight: 100

Now the rollout pauses at 20%, runs 5 Prometheus queries over 5 minutes, and automatically aborts if error rate exceeds 5% in 2 or more checks. No human needed at 2 AM.

Background Analysis for Blue/Green

With blue/green, you can run analysis against the preview environment before production traffic ever sees the new version:

strategy:
  blueGreen:
    activeService: my-app-active
    previewService: my-app-preview
    autoPromotionEnabled: false
    prePromotionAnalysis:
      templates:
      - templateName: error-rate-check
      args:
      - name: service-name
        value: my-app-preview
    postPromotionAnalysis:
      templates:
      - templateName: error-rate-check
      args:
      - name: service-name
        value: my-app-active

Pre-promotion analysis runs against preview. Post-promotion analysis runs against active for a confirmation window. If post-promotion fails, it triggers an automatic rollback.

Practical Tips From Production

1. Start With Manual Promotion

Don’t add automated analysis on day one. Get comfortable with the mechanics first — manual promote and abort commands. Those operations need to be muscle memory before you hand the wheel to automation. Operators who skip this step tend to misread automated failures and override rollbacks that should have stuck.

2. Always Set scaleDownDelaySeconds

The default is 30 seconds. At 1,000+ req/s, bump it to 60–120 seconds. This setting saved me once when a canary silently introduced a memory leak. The spike wasn’t immediate. But those extra two minutes of stable pods being warm let us abort cleanly — zero dropped requests.

strategy:
  blueGreen:
    scaleDownDelaySeconds: 120

3. Use the Dashboard

Argo Rollouts ships a read-only UI that earns its keep during an incident:

# Start the dashboard (port-forward to localhost:3100)
kubectl argo rollouts dashboard

You get a visual timeline of steps, analysis results, and replica counts. When you’re sleep-deprived and trying to explain deployment status on a Slack call, a colour-coded chart beats raw kubectl output every time.

4. Migrate Gradually — Don’t Rewrite Everything at Once

Pick one service. Run it as a Rollout for a sprint, learn where it can fail, then expand. Argo Rollouts is fully compatible with ArgoCD — if you’re already using GitOps, the Rollout manifest just lives in the same repo alongside everything else.

5. Watch the Pause Behavior

A pause: {} with no duration pauses indefinitely until you manually promote. Useful for canaries that need a human sign-off. The trap: if your CI/CD pipeline calls kubectl argo rollouts promote without checking rollout status first, it will promote immediately regardless of metrics.

Always guard the promotion step:

# Wait until rollout is healthy or degraded before promoting
kubectl argo rollouts status my-app --timeout 10m

# Then promote only if status is "Paused"
kubectl argo rollouts promote my-app

Argo Rollouts turns Kubernetes deployments from a binary flip into a controlled, observable process. The first time you watch a bad canary automatically abort and roll back at 3 AM — while you’re still reading the Slack alert — you’ll understand why this matters.

Share: