Alerts and rules¶
You add alerts and recording rules to Prometheus by creating
PrometheusRule Custom Resources. The Operator picks them up, regenerates
the Prometheus configuration secret, and Prometheus hot-reloads. You never
edit prometheus.yml directly.
Anatomy of a PrometheusRule¶
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: my-app-alerts
namespace: my-app
labels:
# The operator picks up rules whose labels match
# prometheus.prometheusSpec.ruleSelector. By default the chart's selector
# uses the chart release labels, so set them explicitly OR widen the
# selector (see Configuration > Selecting custom rules).
app.kubernetes.io/instance: prometheus-stack
app.kubernetes.io/part-of: prometheus-stack
spec:
groups:
- name: my-app.rules
interval: 30s
rules:
# Recording rule
- record: my_app:request_error_ratio:rate5m
expr: |
sum(rate(http_requests_total{job="my-app",code=~"5.."}[5m]))
/
sum(rate(http_requests_total{job="my-app"}[5m]))
# Alerting rule
- alert: MyAppHighErrorRate
expr: my_app:request_error_ratio:rate5m > 0.05
for: 10m
labels:
severity: warning
team: my-team
annotations:
summary: "my-app error ratio above 5% for 10m"
description: |
Error ratio is {{ $value | humanizePercentage }} on namespace
{{ $labels.namespace }}.
runbook_url: https://runbooks.example.com/my-app-high-error-rate
Apply with kubectl apply -f. Within ~30s the Operator reconciles the
config and Prometheus reloads.
Verify a rule was actually loaded
- Open the Prometheus UI » Alerts tab and find the alert by name.
- Or run
kubectl -n monitoring exec deploy/prometheus-stack-grafana -- wget -qO- http://prometheus-stack-kube-prom-prometheus:9090/api/v1/rules(any pod in the namespace that can reach the Prometheus Service works).
Inline rules (small / cluster-wide)¶
For a small number of rules that you want to ship with the chart values, use
additionalPrometheusRulesMap:
additionalPrometheusRulesMap:
platform:
groups:
- name: platform.rules
rules:
- alert: PVCAlmostFull
expr: |
kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes > 0.9
for: 15m
labels:
severity: warning
annotations:
summary: "PVC {{ $labels.persistentvolumeclaim }} >90% full"
Each top-level key under additionalPrometheusRulesMap becomes a separate
PrometheusRule resource.
For per-team or per-app rules, prefer creating a PrometheusRule in the
team's namespace — that way teams own their alerts.
Disabling default rules¶
The chart ships a curated bundle of recording and alerting rules. Toggle groups individually:
defaultRules:
create: true # set false to disable ALL bundled rules
rules:
alertmanager: true
etcd: false # disable on managed clusters
configReloaders: true
general: true
k8sContainerCpuUsageSecondsTotal: true
kubeApiserverAvailability: true
kubeApiserverHistogram: true
kubeApiserverSlos: true
kubeControllerManager: false # disable on managed clusters
kubelet: true
kubeProxy: false # disable when using kube-proxy replacement
kubeScheduler: false # disable on managed clusters
kubeStateMetrics: true
network: true
node: true
nodeExporterAlerting: true
nodeExporterRecording: true
prometheus: true
prometheusOperator: true
Disabling rule groups is mainly useful to silence the perma-firing *Down
alerts when the corresponding scrape target is intentionally absent (managed
control plane, kube-proxy replacement, etc.).
Adding labels to every alert from this Prometheus¶
Use externalLabels to tag every alert (and every metric leaving Prometheus,
e.g. for federation):
prometheus:
prometheusSpec:
externalLabels:
cluster: prod-eu-west
environment: production
Alertmanager can then route based on cluster= or environment=.
Testing rules locally¶
Use the upstream promtool against a YAML file containing only the
spec.groups portion:
# Extract just the spec.groups
yq '.spec' my-rules.yaml > /tmp/rules.yaml
promtool check rules /tmp/rules.yaml
promtool test rules tests.yaml
The chart's admission webhook (prometheusOperator.admissionWebhooks.enabled)
also validates rules at kubectl apply time.
Common pitfalls¶
| Symptom | Likely cause |
|---|---|
| Rule applies cleanly but never fires | Selector mismatch — the Prometheus CR is not picking up the rule. Check prometheus.prometheusSpec.ruleSelector and the labels on the PrometheusRule. |
| Alert fires but no notification | Alertmanager routing does not match the alert's labels. Check Alertmanager UI » Status for the loaded route tree. |
| Alert fires but immediately resolves | for: is too short. Set it to at least 1–5 minutes for noisy signals. |
kubectl apply rejected with webhook error |
The admission webhook caught a syntax error. Run promtool check rules locally. |