Progressive Delivery (Argo Rollouts)

What You'll Learn

Compare deployment strategies: recreate, rolling, blue-green, canary
Install Argo Rollouts and convert a Deployment into a Rollout
Run a canary release with traffic steps and pauses
Promote, abort, and undo a rollout
Add automated analysis that auto-rolls-back on a bad metric
See how blue-green works — and how this all stays GitOps-driven

Prerequisites: A1, A2, A3. A running k3d cluster with the notes app.

Why Progressive Delivery

A normal Kubernetes Deployment does a rolling update: it swaps old pods for new ones until 100% of traffic hits the new version. If v2 has a bug, everyone gets it at once — you find out from your users (or your pager).

All-or-nothing is risky

Rolling updates have no concept of "try it on 5% first." There's no automatic health check on real traffic, and rollback is a manual scramble once it's already affecting everyone.

Progressive delivery rolls a new version out gradually: send it a small slice of traffic, watch the metrics, and only proceed if it's healthy — otherwise roll back automatically, before most users ever notice. Argo Rollouts adds this to Kubernetes with a drop-in Rollout resource.

The payoff

Bad releases are caught at 5–25% blast radius and reverted automatically. You ship more often and more safely — the two goals that usually fight each other.

Deployment Strategies

Strategy	How it works	Trade-off
Recreate	Kill all old, start all new	Downtime
Rolling	Replace pods gradually (K8s default)	No traffic-based safety; both versions serve during the roll
Blue-Green	Run v2 alongside v1, flip 100% at once after testing	Instant switch + easy rollback, but 2× resources
Canary	Shift a small % of traffic to v2, increase in steps	Safest; needs metrics to judge each step

The Rollout resource

Argo Rollouts replaces your Deployment with a Rollout (same pod spec) plus a strategy: block. It manages two ReplicaSets — stable and canary — and steps through setWeight / pause / analysis stages you define.

Traffic on k3d

For precise traffic percentages you add a traffic provider (Istio, NGINX, or SMI). Without one, Argo Rollouts approximates the weight by replica count — e.g. 25% ≈ 1 of 4 pods. That's perfect for learning the workflow on k3d; the manifests gain a few lines when you add a real mesh (A9).

Hands-on Lab: Canary the Notes App

We'll convert the web app to a canary Rollout, then watch a v2 release roll out step by step. Use your k3d cluster.

Install Argo Rollouts + the kubectl plugin

kubectl create namespace argo-rollouts kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml # kubectl plugin (macOS arm64 shown; pick your OS/arch) curl -sLO https://github.com/argoproj/argo-rollouts/releases/latest/download/kubectl-argo-rollouts-darwin-arm64 chmod +x kubectl-argo-rollouts-darwin-arm64 sudo mv kubectl-argo-rollouts-darwin-arm64 /usr/local/bin/kubectl-argo-rollouts kubectl argo rollouts version

Convert the web Deployment into a Rollout

A Rollout is a Deployment with a strategy. Save as k8s-prod/web-rollout.yaml — the canary advances 25% → 50% → 75% → 100% with pauses:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: web
  namespace: notes
spec:
  replicas: 4
  selector:
    matchLabels: { app: web }
  template:
    metadata:
      labels: { app: web }
    spec:
      containers:
        - name: web
          image: notes-app:1.0
          imagePullPolicy: IfNotPresent
          env:
            - name: DATABASE_URL
              value: postgresql://notes:secret@db:5432/notesdb
          ports:
            - containerPort: 5000
          readinessProbe:
            httpGet: { path: /health, port: 5000 }
  strategy:
    canary:
      steps:
        - setWeight: 25
        - pause: { duration: 30s }
        - setWeight: 50
        - pause: {}            # pause forever — wait for manual promote
        - setWeight: 75
        - pause: { duration: 30s }

Replace, don't duplicate

If the A1 web Deployment is still applied, delete it first (kubectl delete deploy web -n notes) — the Rollout takes over managing the web pods and Service.

kubectl apply -f k8s-prod/web-rollout.yaml kubectl argo rollouts get rollout web -n notes # initial rollout (all stable)

Build a v2 image to roll out

Make any visible change to the notes app (e.g. add an /about route), rebuild, and import it as 2.0:

docker build -t notes-app:2.0 . k3d image import notes-app:2.0 -c notes

Trigger the canary & watch it

Point the rollout at v2. In one terminal, watch the live progress dashboard:

# terminal 1 — live, colorized rollout view kubectl argo rollouts get rollout web -n notes --watch # terminal 2 — start the canary kubectl argo rollouts set image web web=notes-app:2.0 -n notes

Watch it move to 25% (1 canary pod), wait 30s, advance to 50%, then pause indefinitely at step 4 — exactly where you told it to stop for a human decision.

The "aha!" moment

v2 is live for a fraction of traffic while v1 still serves the rest. If v2 were broken, only a quarter of users would be affected — and you haven't committed to it yet.

Promote — or abort

v2 looks good? Promote it through the remaining steps. Looks bad? Abort and it snaps back to 100% stable (v1) instantly:

kubectl argo rollouts promote web -n notes        # continue past the pause
# or, if something's wrong:
kubectl argo rollouts abort   web -n notes        # instant rollback to stable
kubectl argo rollouts undo    web -n notes        # roll back to the previous revision

Open the dashboard UI

kubectl argo rollouts dashboard serves a local web UI at localhost:3100 — a visual view of canary weight, steps, and revisions.

Automate the decision — analysis

A human gate is fine, but the real win is automated analysis: query a metric at each step and auto-abort if it's bad. This AnalysisTemplate checks the success rate from Prometheus (from beginner Module 11 / advanced A8):

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
  namespace: notes
spec:
  metrics:
    - name: success-rate
      interval: 30s
      failureLimit: 2
      successCondition: result[0] >= 0.95
      provider:
        prometheus:
          address: http://prometheus.notes:9090
          query: |
            sum(rate(flask_http_request_total{status=~"2.."}[1m]))
            / sum(rate(flask_http_request_total[1m]))

Wire it into the canary so it runs alongside the steps — if success rate drops below 95% twice, the rollout aborts itself:

  strategy:
    canary:
      analysis:
        templates:
          - templateName: success-rate
        startingStep: 1        # begin analysis after the first setWeight
      steps:
        - setWeight: 25
        - pause: { duration: 1m }
        - setWeight: 50
        - pause: { duration: 1m }

Hands-off safety

Now a bad deploy is detected by metrics, not a human watching a dashboard, and rolled back before it spreads. This is the heart of progressive delivery.

Alternative: blue-green

Prefer an instant switch with a preview environment? Swap the strategy block. Blue-green keeps v2 fully running on a preview Service; you test it, then flip the active Service to it:

  strategy:
    blueGreen:
      activeService: web            # live traffic
      previewService: web-preview    # test v2 here first
      autoPromotionEnabled: false    # require a manual promote

You'd add a second Service named web-preview. Test v2 via the preview Service, then kubectl argo rollouts promote web to flip 100% of live traffic over — instant, with an instant rollback if needed.

Keep it GitOps

This stays fully declarative. Put the Rollout in your Helm chart (A2) and let Argo CD (A3) manage it. A release then becomes: bump the image tag in Git → Argo CD syncs the new Rollout spec → Argo Rollouts runs the canary → analysis promotes or aborts. No imperative commands in steady state.

The pieces click together

A2 packaged it, A3 syncs it from Git, A4 rolls it out safely. That's a real continuous-delivery pipeline — built entirely from declarative YAML.

Clean up

kubectl delete rollout web -n notes kubectl delete analysistemplate success-rate -n notes 2>/dev/null

Argo Rollouts Cheat Sheet

Command	What it does
`kubectl argo rollouts get rollout N --watch`	Live, colorized rollout status
`kubectl argo rollouts set image N c=img`	Start a rollout to a new image
`kubectl argo rollouts promote N`	Advance past a pause
`kubectl argo rollouts promote N --full`	Skip all remaining steps
`kubectl argo rollouts abort N`	Stop & revert to stable
`kubectl argo rollouts undo N`	Roll back to the previous revision
`kubectl argo rollouts retry rollout N`	Retry an aborted rollout
`kubectl argo rollouts dashboard`	Open the local web UI
`kubectl argo rollouts status N`	One-line health/phase

Canary vs. blue-green — pick by need

Canary: gradual, metric-driven, lowest risk — great for user-facing web traffic. Blue-green: instant cutover with a tested preview — great when you can't run two versions live at once or need an atomic switch.

Troubleshooting

Symptom	Likely cause & fix
Rollout stuck at a step	A `pause: {}` waits forever by design — `promote` it, or add a `duration`.
Traffic % looks off	No traffic provider on k3d — weight is approximated by replica count. Expected.
Analysis always fails	Prometheus address/query wrong, or no traffic to measure — generate load; verify the query in Prometheus.
`web` pods not managed	Old Deployment still owns them — delete it so the Rollout takes over.
New image won't appear	Forgot `k3d image import notes-app:2.0`.
Can't run `kubectl argo rollouts`	Plugin not on PATH — reinstall to `/usr/local/bin`.

Your Challenge

Ship a deliberately broken v3 (e.g. crash on start) and confirm the canary aborts automatically.
Add an analysis before the first weight (prePromotionAnalysis for blue-green).
Move the Rollout into your Helm chart and deploy it via Argo CD (A2 + A3).
Add a setCanaryScale step to control how many canary pods run independent of weight.
Bonus: wire an NGINX/Istio traffic router for real percentage-based splitting (preview of A9).

# point the rollout at an image that fails its readiness probe:
kubectl argo rollouts set image web web=notes-app:broken -n notes
# the canary pod never becomes Ready -> the rollout degrades.
# With an AnalysisTemplate attached, it aborts back to stable on its own.
kubectl argo rollouts get rollout web -n notes --watch   # watch it revert

Recap & What's Next

You can now

Convert a Deployment to a Rollout, run canary and blue-green releases, promote/abort/undo, and add automated metric analysis that auto-rolls-back bad versions — all GitOps-friendly. Releases are now low-risk.

Next up: A5 — Advanced Terraform, where you'll level up your IaC with reusable modules, remote state, and workspaces — and provision a real cluster instead of a local one.

A3: GitOps A5: Advanced Terraform