Monitoring & Logging (Prometheus & Grafana)

What You'll Learn

By the end of this module you will be able to:

Explain why monitoring is non-negotiable, and what "observability" means
Know the three pillars — metrics, logs, traces — and the four golden signals
Instrument the notes app to expose a /metrics endpoint
Run Prometheus to scrape those metrics and query them with PromQL
Build a live Grafana dashboard from your metrics
Read logs and understand where alerting fits

Prerequisites: Module 7 (Compose) — we extend the same stack.

Why Monitoring Exists

Your app is deployed and running. But is it healthy? How many requests is it serving? How fast? Are any failing? Without monitoring, you're flying blind — and the first you'll hear of an outage is an angry user.

You can't fix what you can't see

"It's slow" — how slow? Since when? For everyone or just some? Without data you're guessing. Monitoring turns guesses into facts and surfaces problems before they become outages.

Observability is the goal: being able to understand what's happening inside your system from the outside. It rests on three pillars:

Pillar	What it answers	Tool here
Metrics	"How much / how fast / how many?" — numbers over time	Prometheus + Grafana
Logs	"What exactly happened at 14:32?" — event records	`docker logs` / Loki
Traces	"Where did this request spend its time?" — request paths	Jaeger / Tempo (advanced)

The four golden signals

A simple framework for what to watch on any service: Latency (how slow), Traffic (how much demand), Errors (how many fail), and Saturation (how full your resources are). Master these four and you cover most real incidents.

How Prometheus & Grafana Fit Together

Piece	Its job
Instrumentation	Code in your app that exposes metrics at a `/metrics` URL.
Prometheus	A time-series database that pulls (scrapes) `/metrics` on a schedule and stores the numbers.
PromQL	Prometheus's query language for slicing those metrics (rates, averages, totals).
Grafana	Dashboards — turns Prometheus queries into graphs anyone can read.
Alertmanager	Sends notifications (email, Slack) when a metric crosses a threshold.

The flow in one line

Your app exposes /metrics → Prometheus scrapes & stores it → Grafana queries Prometheus & draws graphs → Alertmanager pages you when something's wrong.

Pull, not push

Unlike many tools, Prometheus pulls metrics from your app on a timer. Your app's only job is to expose /metrics — it doesn't need to know Prometheus exists. That simplicity is why Prometheus became the standard.

Hands-on Lab: Monitor the Notes App

We'll instrument the notes app, then extend the Compose stack from Module 7 with Prometheus and Grafana. Work in your notes-app folder.

1

Instrument the app — expose `/metrics`

Add two lines to app.py. The prometheus-flask-exporter library automatically tracks request count, latency, and errors, and serves them at /metrics.

from prometheus_flask_exporter import PrometheusMetrics

app = Flask(__name__)
metrics = PrometheusMetrics(app)    # <- adds /metrics automatically

Add the library to requirements.txt:

flask==3.0.3 psycopg2-binary==2.9.9 prometheus-flask-exporter==0.23.1

2

Tell Prometheus what to scrape — `prometheus.yml`

This config says "every 5 seconds, scrape the web service on port 5000." The target web:5000 works because Compose's network resolves service names (Module 7).

global:
  scrape_interval: 5s

scrape_configs:
  - job_name: notes-app
    static_configs:
      - targets: ["web:5000"]

3

Extend the stack — add to `docker-compose.yml`

Add Prometheus and Grafana as two new services under services:, alongside web and db from Module 7.

  prometheus:
    image: prom/prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml

  grafana:
    image: grafana/grafana
    ports:
      - "3000:3000"
    depends_on:
      - prometheus

4

Launch the whole stack

docker compose up --build

Four containers now run together: the app, its database, Prometheus, and Grafana. First, confirm your app is emitting metrics — open http://localhost:8080/metrics and you'll see raw counters like flask_http_request_total.

5

Generate some traffic

Metrics are boring with no activity. Hit the app a few dozen times (refresh the browser, or run this):

for i in $(seq 1 50); do curl -s http://localhost:8080/ > /dev/null; done

6

Query in Prometheus

Open http://localhost:9090. Under Status → Targets confirm notes-app is UP. Then in the query box, try:

# total requests served flask_http_request_total # requests per second over the last minute (traffic) rate(flask_http_request_total[1m])

The "aha!" moment

You're seeing live, queryable data about your running app — request rates, response times, error counts — all without changing your app logic. That's instrumentation paying off.

7

Connect Grafana to Prometheus

Open http://localhost:3000 and log in with admin / admin (it'll ask you to set a new password). Then:

Go to Connections → Data sources → Add data source → Prometheus
Set the URL to http://prometheus:9090 (the service name — same network!)
Click Save & test — you should see "Successfully queried".

8

Build your first dashboard panel

Create Dashboards → New → New dashboard → Add visualization, pick your Prometheus source, and enter this query:

rate(flask_http_request_total[1m])

Generate more traffic (Step 5) and watch the graph climb in real time. Add a second panel for latency:

rate(flask_http_request_duration_seconds_sum[1m]) / rate(flask_http_request_duration_seconds_count[1m])

You built an observability stack

App → metrics → Prometheus → Grafana, all wired together with Compose. This is genuinely how production systems are monitored.

Shortcut: import a pre-built dashboard

Instead of building panels by hand, go to Dashboards → New → Import and paste a dashboard ID from grafana.com/dashboards. The community has thousands ready to go.

9

Don't forget logs

Metrics tell you something's wrong; logs tell you what. You already have them:

docker compose logs -f web # follow the app's logs

Scaling logs up

For many containers, you graduate from docker logs to a log aggregator — Loki (pairs with Grafana) or the ELK stack (Elasticsearch + Logstash + Kibana) — so all logs are searchable in one place.

The Metric Types You'll Meet

Type	What it is	Example
Counter	Only goes up; reset on restart. Use `rate()` to get per-second.	total requests served
Gauge	Goes up and down — a current value.	memory in use, active connections
Histogram	Buckets observations to compute averages & percentiles.	request duration
Summary	Like a histogram, with client-side quantiles.	response size

Counters: always wrap in `rate()`

A raw counter just climbs forever — not useful on a graph. rate(metric[1m]) turns it into "per second over the last minute," which is what you actually want to see.

Cheat Sheet

Ports, queries, and commands you'll reach for. Bookmark this.

Default ports

Service	URL	Login
App metrics	`localhost:8080/metrics`	—
Prometheus	`localhost:9090`	—
Grafana	`localhost:3000`	admin / admin

PromQL starters

Query	Shows
`up`	Which targets are reachable (1 = up)
`flask_http_request_total`	Total requests (a counter)
`rate(flask_http_request_total[1m])`	Requests per second (traffic)
`rate(...[1m]) by (status)`	Rate split by HTTP status (find errors)
`histogram_quantile(0.95, ...)`	95th-percentile latency

Troubleshooting

Symptom	Likely cause & fix
Prometheus target is `DOWN`	Wrong target. Use the service name + container port: `web:5000`, not localhost/8080.
`/metrics` is 404	`PrometheusMetrics(app)` not added, or library missing from requirements.
Grafana "data source not working"	URL must be `http://prometheus:9090` (service name), not localhost.
Empty graphs	No traffic yet — generate some (Step 5); counters need `rate()`.
Prometheus won't start	`prometheus.yml` indentation (2 spaces, no tabs) or wrong volume path.
Port 3000/9090 in use	Change the host port, e.g. `"3001:3000"`.

Your Challenge

Deepen it before the Capstone:

Add a Grafana panel for error rate — requests with status 500 per second.
Add a 95th-percentile latency panel with histogram_quantile.
Import a community Flask/Prometheus dashboard by its ID.
Bonus: add a Prometheus alert rule that fires when no requests arrive for 1 minute.

# requests per second that returned a 500 error rate(flask_http_request_total{status="500"}[1m]) # error ratio (fraction of all requests that failed) sum(rate(flask_http_request_total{status="500"}[1m])) / sum(rate(flask_http_request_total[1m]))

Recap & What's Next

You can now

Instrument an app, scrape it with Prometheus, query metrics with PromQL, visualize them in Grafana, and find logs — the full observability loop. You can finally see what your deployments are doing.

Next up: Module 12 — the Capstone, where everything from Modules 6–11 comes together: code → container → cluster → CI/CD → monitored, end to end.

Monitoring & Logging