Alerts and rules¶

2026-04-27

You add alerts and recording rules to Prometheus by creating PrometheusRule Custom Resources. The Operator picks them up, regenerates the Prometheus configuration secret, and Prometheus hot-reloads. You never edit prometheus.yml directly.

Anatomy of a `PrometheusRule`¶

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: my-app-alerts
  namespace: my-app
  labels:
    # The operator picks up rules whose labels match
    # prometheus.prometheusSpec.ruleSelector. By default the chart's selector
    # uses the chart release labels, so set them explicitly OR widen the
    # selector (see Configuration > Selecting custom rules).
    app.kubernetes.io/instance: prometheus-stack
    app.kubernetes.io/part-of: prometheus-stack
spec:
  groups:
    - name: my-app.rules
      interval: 30s
      rules:
        # Recording rule
        - record: my_app:request_error_ratio:rate5m
          expr: |
            sum(rate(http_requests_total{job="my-app",code=~"5.."}[5m]))
            /
            sum(rate(http_requests_total{job="my-app"}[5m]))

        # Alerting rule
        - alert: MyAppHighErrorRate
          expr: my_app:request_error_ratio:rate5m > 0.05
          for: 10m
          labels:
            severity: warning
            team: my-team
          annotations:
            summary: "my-app error ratio above 5% for 10m"
            description: |
              Error ratio is {{ $value | humanizePercentage }} on namespace
              {{ $labels.namespace }}.
            runbook_url: https://runbooks.example.com/my-app-high-error-rate

Apply with kubectl apply -f. Within ~30s the Operator reconciles the config and Prometheus reloads.

Verify a rule was actually loaded

Open the Prometheus UI » Alerts tab and find the alert by name.
Or run kubectl -n monitoring exec deploy/prometheus-stack-grafana -- wget -qO- http://prometheus-stack-kube-prom-prometheus:9090/api/v1/rules (any pod in the namespace that can reach the Prometheus Service works).

Inline rules (small / cluster-wide)¶

For a small number of rules that you want to ship with the chart values, use additionalPrometheusRulesMap:

additionalPrometheusRulesMap:
  platform:
    groups:
      - name: platform.rules
        rules:
          - alert: PVCAlmostFull
            expr: |
              kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes > 0.9
            for: 15m
            labels:
              severity: warning
            annotations:
              summary: "PVC {{ $labels.persistentvolumeclaim }} >90% full"

Each top-level key under additionalPrometheusRulesMap becomes a separate PrometheusRule resource.

For per-team or per-app rules, prefer creating a PrometheusRule in the team's namespace — that way teams own their alerts.

Disabling default rules¶

The chart ships a curated bundle of recording and alerting rules. Toggle groups individually:

defaultRules:
  create: true       # set false to disable ALL bundled rules
  rules:
    alertmanager: true
    etcd: false              # disable on managed clusters
    configReloaders: true
    general: true
    k8sContainerCpuUsageSecondsTotal: true
    kubeApiserverAvailability: true
    kubeApiserverHistogram: true
    kubeApiserverSlos: true
    kubeControllerManager: false   # disable on managed clusters
    kubelet: true
    kubeProxy: false               # disable when using kube-proxy replacement
    kubeScheduler: false           # disable on managed clusters
    kubeStateMetrics: true
    network: true
    node: true
    nodeExporterAlerting: true
    nodeExporterRecording: true
    prometheus: true
    prometheusOperator: true

Disabling rule groups is mainly useful to silence the perma-firing *Down alerts when the corresponding scrape target is intentionally absent (managed control plane, kube-proxy replacement, etc.).

Adding labels to every alert from this Prometheus¶

Use externalLabels to tag every alert (and every metric leaving Prometheus, e.g. for federation):

prometheus:
  prometheusSpec:
    externalLabels:
      cluster: prod-eu-west
      environment: production

Alertmanager can then route based on cluster= or environment=.

Testing rules locally¶

Use the upstream promtool against a YAML file containing only the spec.groups portion:

# Extract just the spec.groups
yq '.spec' my-rules.yaml > /tmp/rules.yaml
promtool check rules /tmp/rules.yaml
promtool test rules tests.yaml

The chart's admission webhook (prometheusOperator.admissionWebhooks.enabled) also validates rules at kubectl apply time.

Common pitfalls¶

Symptom	Likely cause
Rule applies cleanly but never fires	Selector mismatch — the `Prometheus` CR is not picking up the rule. Check `prometheus.prometheusSpec.ruleSelector` and the labels on the `PrometheusRule`.
Alert fires but no notification	Alertmanager routing does not match the alert's labels. Check Alertmanager UI » Status for the loaded route tree.
Alert fires but immediately resolves	`for:` is too short. Set it to at least 1–5 minutes for noisy signals.
`kubectl apply` rejected with webhook error	The admission webhook caught a syntax error. Run `promtool check rules` locally.