See what your app is actually doing — before your users tell you it's broken.
Module 11 · Prometheus + Grafana on the notes app. Free · Local.
Beginner+ Observability ~55 minBy the end of this module you will be able to:
/metrics endpointPrerequisites: Module 7 (Compose) — we extend the same stack.
Your app is deployed and running. But is it healthy? How many requests is it serving? How fast? Are any failing? Without monitoring, you're flying blind — and the first you'll hear of an outage is an angry user.
"It's slow" — how slow? Since when? For everyone or just some? Without data you're guessing. Monitoring turns guesses into facts and surfaces problems before they become outages.
Observability is the goal: being able to understand what's happening inside your system from the outside. It rests on three pillars:
| Pillar | What it answers | Tool here |
|---|---|---|
| Metrics | "How much / how fast / how many?" — numbers over time | Prometheus + Grafana |
| Logs | "What exactly happened at 14:32?" — event records | docker logs / Loki |
| Traces | "Where did this request spend its time?" — request paths | Jaeger / Tempo (advanced) |
A simple framework for what to watch on any service: Latency (how slow), Traffic (how much demand), Errors (how many fail), and Saturation (how full your resources are). Master these four and you cover most real incidents.
| Piece | Its job |
|---|---|
| Instrumentation | Code in your app that exposes metrics at a /metrics URL. |
| Prometheus | A time-series database that pulls (scrapes) /metrics on a schedule and stores the numbers. |
| PromQL | Prometheus's query language for slicing those metrics (rates, averages, totals). |
| Grafana | Dashboards — turns Prometheus queries into graphs anyone can read. |
| Alertmanager | Sends notifications (email, Slack) when a metric crosses a threshold. |
Your app exposes /metrics → Prometheus scrapes & stores it → Grafana queries Prometheus & draws graphs → Alertmanager pages you when something's wrong.
Unlike many tools, Prometheus pulls metrics from your app on a timer. Your app's only job is to expose /metrics — it doesn't need to know Prometheus exists. That simplicity is why Prometheus became the standard.
We'll instrument the notes app, then extend the Compose stack from Module 7 with Prometheus and Grafana. Work in your notes-app folder.
/metricsAdd two lines to app.py. The prometheus-flask-exporter library automatically tracks request count, latency, and errors, and serves them at /metrics.
Add the library to requirements.txt:
prometheus.ymlThis config says "every 5 seconds, scrape the web service on port 5000." The target web:5000 works because Compose's network resolves service names (Module 7).
docker-compose.ymlAdd Prometheus and Grafana as two new services under services:, alongside web and db from Module 7.
Four containers now run together: the app, its database, Prometheus, and Grafana. First, confirm your app is emitting metrics — open http://localhost:8080/metrics and you'll see raw counters like flask_http_request_total.
Metrics are boring with no activity. Hit the app a few dozen times (refresh the browser, or run this):
Open http://localhost:9090. Under Status → Targets confirm notes-app is UP. Then in the query box, try:
You're seeing live, queryable data about your running app — request rates, response times, error counts — all without changing your app logic. That's instrumentation paying off.
Open http://localhost:3000 and log in with admin / admin (it'll ask you to set a new password). Then:
http://prometheus:9090 (the service name — same network!)Create Dashboards → New → New dashboard → Add visualization, pick your Prometheus source, and enter this query:
Generate more traffic (Step 5) and watch the graph climb in real time. Add a second panel for latency:
App → metrics → Prometheus → Grafana, all wired together with Compose. This is genuinely how production systems are monitored.
Instead of building panels by hand, go to Dashboards → New → Import and paste a dashboard ID from grafana.com/dashboards. The community has thousands ready to go.
Metrics tell you something's wrong; logs tell you what. You already have them:
For many containers, you graduate from docker logs to a log aggregator — Loki (pairs with Grafana) or the ELK stack (Elasticsearch + Logstash + Kibana) — so all logs are searchable in one place.
| Type | What it is | Example |
|---|---|---|
| Counter | Only goes up; reset on restart. Use rate() to get per-second. | total requests served |
| Gauge | Goes up and down — a current value. | memory in use, active connections |
| Histogram | Buckets observations to compute averages & percentiles. | request duration |
| Summary | Like a histogram, with client-side quantiles. | response size |
rate()A raw counter just climbs forever — not useful on a graph. rate(metric[1m]) turns it into "per second over the last minute," which is what you actually want to see.
Ports, queries, and commands you'll reach for. Bookmark this.
| Service | URL | Login |
|---|---|---|
| App metrics | localhost:8080/metrics | — |
| Prometheus | localhost:9090 | — |
| Grafana | localhost:3000 | admin / admin |
| Query | Shows |
|---|---|
up | Which targets are reachable (1 = up) |
flask_http_request_total | Total requests (a counter) |
rate(flask_http_request_total[1m]) | Requests per second (traffic) |
rate(...[1m]) by (status) | Rate split by HTTP status (find errors) |
histogram_quantile(0.95, ...) | 95th-percentile latency |
| Symptom | Likely cause & fix |
|---|---|
Prometheus target is DOWN | Wrong target. Use the service name + container port: web:5000, not localhost/8080. |
/metrics is 404 | PrometheusMetrics(app) not added, or library missing from requirements. |
| Grafana "data source not working" | URL must be http://prometheus:9090 (service name), not localhost. |
| Empty graphs | No traffic yet — generate some (Step 5); counters need rate(). |
| Prometheus won't start | prometheus.yml indentation (2 spaces, no tabs) or wrong volume path. |
| Port 3000/9090 in use | Change the host port, e.g. "3001:3000". |
Deepen it before the Capstone:
500 per second.histogram_quantile.Instrument an app, scrape it with Prometheus, query metrics with PromQL, visualize them in Grafana, and find logs — the full observability loop. You can finally see what your deployments are doing.
Next up: Module 12 — the Capstone, where everything from Modules 6–11 comes together: code → container → cluster → CI/CD → monitored, end to end.