Run a real multi-node cluster the way production does — limits, probes, RBAC, autoscaling, durable storage.
Advanced A1 · The notes app on a 3-node k3d cluster. Free · Local.
Advanced Kubernetes ~75 minPrerequisites: the beginner Kubernetes module and Docker. You should already know Pods, Deployments, and Services.
In the beginner course you ran the app on a single-node Minikube with bare Deployments. That proves the concepts, but a real cluster does much more — and assumes you've configured the things Minikube let you skip.
| Beginner (Minikube) | Production (this module) |
|---|---|
| One node | Multiple nodes — pods are scheduled across them |
| No resource limits | Requests & limits so one pod can't starve the node |
| One probe (readiness) | Liveness + readiness + startup for self-healing |
| DB as a throwaway Deployment | StatefulSet + PVC so data actually survives |
| Fixed replica count | Autoscaling on real CPU load |
| Full cluster-admin | Namespaces + RBAC, least privilege |
k3d runs lightweight k3s clusters inside Docker — so you can create a genuine multi-node cluster on your laptop in seconds, for free. Everything here works identically on EKS/GKE/AKS; the manifests don't change. Bonus: k3s ships with metrics-server built in, so autoscaling works out of the box.
We'll deploy the same notes app onto a 3-node cluster and harden it one production concern at a time. Put these manifests in a k8s-prod/ folder.
Install k3d (brew install k3d, or the install script below), then create 1 server + 2 agents:
You now have a real multi-node Kubernetes cluster. k3d wired kubectl to it automatically.
Namespaces isolate workloads. Create one, then import the notes-app:1.0 image you built in the beginner course into the cluster:
Save typing -n notes on every command: kubectl config set-context --current --namespace=notes.
The beginner DB was a Deployment with no storage — restart it and data vanished. Production databases use a StatefulSet with a volumeClaimTemplate, giving each replica a stable identity and its own persistent disk. Save as k8s-prod/db.yaml:
StatefulSets give pods stable names (db-0, db-1…) and bind each to its own volume that survives rescheduling. Deployments treat pods as interchangeable and lose the disk — fine for stateless web apps, fatal for databases.
This is the production-grade web Deployment. Save as k8s-prod/web.yaml — note the resources block and the three probes:
Run kubectl get pods -o wide — your web pods landed on different nodes. The scheduler placed them for you, respecting the CPU/memory requests you declared. That's real Kubernetes.
| Setting | What it does |
|---|---|
requests | What the pod is guaranteed. The scheduler uses it to choose a node. |
limits | The hard ceiling. Exceed CPU → throttled; exceed memory → the pod is OOM-killed. |
startupProbe | Protects slow-booting apps — liveness/readiness wait until it passes once. |
readinessProbe | Gates traffic. Failing = removed from the Service, but not restarted. |
livenessProbe | Detects a hung process and restarts the container. |
The HorizontalPodAutoscaler measures usage against the CPU request. Without a CPU request set (Step 4), HPA has no baseline to compute a percentage from and won't scale. Always set requests.
Add a HorizontalPodAutoscaler that keeps CPU near 50%, scaling between 2 and 10 pods. Save as k8s-prod/hpa.yaml:
Apply it, then generate load from inside the cluster and watch it scale:
Within a minute or two the REPLICAS column climbs past 2 as CPU rises. Stop the load (kubectl delete pod load -n notes) and it scales back down after the cool-down.
You didn't change replica counts by hand — the cluster grew the app to meet demand and shrank it when idle. That's production autoscaling, running free on your laptop.
In production, not everything runs as cluster-admin. RBAC grants the minimum permissions needed. Here's a ServiceAccount that can only read pods in the notes namespace. Save as k8s-prod/rbac.yaml:
Apply it, then verify the boundaries with kubectl auth can-i:
A Role + RoleBinding grant permissions within one namespace. Their cluster-wide cousins are ClusterRole + ClusterRoleBinding. Prefer namespaced Roles — narrower blast radius.
Open the app via the k3d load balancer at http://localhost:8080 (route through an Ingress, or temporarily kubectl port-forward -n notes svc/web 8080:5000).
| Command | What it does |
|---|---|
k3d cluster create N --agents 2 | Create a multi-node cluster |
k3d cluster list / delete N | List / delete clusters |
k3d image import IMG -c N | Load a local image into the cluster |
k3d node list | List the cluster's nodes |
| Command | What it does |
|---|---|
kubectl get pods -o wide | See which node each pod runs on |
kubectl get hpa -w | Watch the autoscaler live |
kubectl top pods / nodes | Live CPU/memory usage (metrics-server) |
kubectl get pvc | List persistent volume claims |
kubectl auth can-i VERB RES --as=SA | Test RBAC permissions |
kubectl rollout restart deploy/web | Restart pods (e.g. after a new image) |
kubectl describe pod P | Events, limits, probe status |
kubectl scale deploy/web --replicas=N | Manual scale (HPA overrides this) |
| Symptom | Likely cause & fix |
|---|---|
HPA shows <unknown> for CPU | No CPU request set, or metrics-server not ready yet — wait, and confirm requests exist. |
Pod ImagePullBackOff | Forgot k3d image import, or imagePullPolicy not IfNotPresent. |
Pod Pending | No node has enough free CPU/memory for the requests — lower them or add an agent. |
Pod OOMKilled | Hit the memory limit — raise it or fix the leak. |
| DB data lost on restart | Using a Deployment, or the volume isn't mounted at the data path — use the StatefulSet + PVC. |
| Liveness keeps restarting the pod | Probe too aggressive for a slow start — add/extend the startupProbe. |
kubectl drain) and watch pods reschedule to the others.ConfigMap-driven setting and roll it out with kubectl rollout.Run a multi-node cluster, set resource requests/limits, configure all three probes, persist data with a StatefulSet + PVC, autoscale under load, and enforce least-privilege RBAC. This is the baseline every production workload needs.
Next up: A2 — Helm, where you'll package all these manifests into a single reusable chart with per-environment values, instead of juggling raw YAML files.