Skip to content

Alerts and rules

You add alerts and recording rules to Prometheus by creating PrometheusRule Custom Resources. The Operator picks them up, regenerates the Prometheus configuration secret, and Prometheus hot-reloads. You never edit prometheus.yml directly.

Anatomy of a PrometheusRule

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: my-app-alerts
  namespace: my-app
  labels:
    # The operator picks up rules whose labels match
    # prometheus.prometheusSpec.ruleSelector. By default the chart's selector
    # uses the chart release labels, so set them explicitly OR widen the
    # selector (see Configuration > Selecting custom rules).
    app.kubernetes.io/instance: prometheus-stack
    app.kubernetes.io/part-of: prometheus-stack
spec:
  groups:
    - name: my-app.rules
      interval: 30s
      rules:
        # Recording rule
        - record: my_app:request_error_ratio:rate5m
          expr: |
            sum(rate(http_requests_total{job="my-app",code=~"5.."}[5m]))
            /
            sum(rate(http_requests_total{job="my-app"}[5m]))

        # Alerting rule
        - alert: MyAppHighErrorRate
          expr: my_app:request_error_ratio:rate5m > 0.05
          for: 10m
          labels:
            severity: warning
            team: my-team
          annotations:
            summary: "my-app error ratio above 5% for 10m"
            description: |
              Error ratio is {{ $value | humanizePercentage }} on namespace
              {{ $labels.namespace }}.
            runbook_url: https://runbooks.example.com/my-app-high-error-rate

Apply with kubectl apply -f. Within ~30s the Operator reconciles the config and Prometheus reloads.

Verify a rule was actually loaded

  1. Open the Prometheus UI » Alerts tab and find the alert by name.
  2. Or run kubectl -n monitoring exec deploy/prometheus-stack-grafana -- wget -qO- http://prometheus-stack-kube-prom-prometheus:9090/api/v1/rules (any pod in the namespace that can reach the Prometheus Service works).

Inline rules (small / cluster-wide)

For a small number of rules that you want to ship with the chart values, use additionalPrometheusRulesMap:

additionalPrometheusRulesMap:
  platform:
    groups:
      - name: platform.rules
        rules:
          - alert: PVCAlmostFull
            expr: |
              kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes > 0.9
            for: 15m
            labels:
              severity: warning
            annotations:
              summary: "PVC {{ $labels.persistentvolumeclaim }} >90% full"

Each top-level key under additionalPrometheusRulesMap becomes a separate PrometheusRule resource.

For per-team or per-app rules, prefer creating a PrometheusRule in the team's namespace — that way teams own their alerts.

Disabling default rules

The chart ships a curated bundle of recording and alerting rules. Toggle groups individually:

defaultRules:
  create: true       # set false to disable ALL bundled rules
  rules:
    alertmanager: true
    etcd: false              # disable on managed clusters
    configReloaders: true
    general: true
    k8sContainerCpuUsageSecondsTotal: true
    kubeApiserverAvailability: true
    kubeApiserverHistogram: true
    kubeApiserverSlos: true
    kubeControllerManager: false   # disable on managed clusters
    kubelet: true
    kubeProxy: false               # disable when using kube-proxy replacement
    kubeScheduler: false           # disable on managed clusters
    kubeStateMetrics: true
    network: true
    node: true
    nodeExporterAlerting: true
    nodeExporterRecording: true
    prometheus: true
    prometheusOperator: true

Disabling rule groups is mainly useful to silence the perma-firing *Down alerts when the corresponding scrape target is intentionally absent (managed control plane, kube-proxy replacement, etc.).

Adding labels to every alert from this Prometheus

Use externalLabels to tag every alert (and every metric leaving Prometheus, e.g. for federation):

prometheus:
  prometheusSpec:
    externalLabels:
      cluster: prod-eu-west
      environment: production

Alertmanager can then route based on cluster= or environment=.

Testing rules locally

Use the upstream promtool against a YAML file containing only the spec.groups portion:

# Extract just the spec.groups
yq '.spec' my-rules.yaml > /tmp/rules.yaml
promtool check rules /tmp/rules.yaml
promtool test rules tests.yaml

The chart's admission webhook (prometheusOperator.admissionWebhooks.enabled) also validates rules at kubectl apply time.

Common pitfalls

Symptom Likely cause
Rule applies cleanly but never fires Selector mismatch — the Prometheus CR is not picking up the rule. Check prometheus.prometheusSpec.ruleSelector and the labels on the PrometheusRule.
Alert fires but no notification Alertmanager routing does not match the alert's labels. Check Alertmanager UI » Status for the loaded route tree.
Alert fires but immediately resolves for: is too short. Set it to at least 1–5 minutes for noisy signals.
kubectl apply rejected with webhook error The admission webhook caught a syntax error. Run promtool check rules locally.