Configuration¶

2026-04-27

The chart's values.yaml is large (~5000 lines, ~169 KB) because it composes five subcharts plus its own templates. This page walks you through the keys you almost always touch, and points you at sections you might touch occasionally.

For an exhaustive reference, see the Values Reference page.

Values file map¶

Top-level key	What it configures
`nameOverride`, `namespaceOverride`, `fullnameOverride`	Naming for chart-rendered resources
`commonLabels`	Labels added to every resource the chart renders
`crds`	Whether to install CRDs (`crds.enabled`)
`defaultRules`	Whether to ship the default `PrometheusRule`s, and which groups
`additionalPrometheusRulesMap`	Inline custom `PrometheusRule`s
`global`	Image registry, image pull secrets, RBAC settings shared by subcharts
`windowsMonitoring`	Toggle Windows node monitoring
`prometheus-windows-exporter`	Pass-through to the Windows exporter subchart
`alertmanager`	Alertmanager CR + ingress + Service + config
`grafana`	Pass-through to the Grafana subchart
`kubernetesServiceMonitors`	Toggle the bundle of kube-* ServiceMonitors
`kubeApiServer`, `kubelet`, `kubeControllerManager`, `kubeScheduler`, `kubeProxy`, `kubeEtcd`, `coreDns`, `kubeDns`	Per-component ServiceMonitors
`kubeStateMetrics` / `kube-state-metrics`	Toggle and pass-through to subchart
`nodeExporter` / `prometheus-node-exporter`	Toggle and pass-through to subchart
`prometheusOperator`	The operator Deployment, RBAC, webhooks, TLS
`prometheus`	The `Prometheus` CR + ingress + Service + scrape config selectors
`thanosRuler`	Optional `ThanosRuler` CR
`cleanPrometheusOperatorObjectNames`	Renaming knob for legacy installs
`extraManifests`	Free-form list of extra Kubernetes manifests

Naming and namespace¶

nameOverride: "monitoring"
namespaceOverride: "monitoring"

The OVES copy of the chart sets nameOverride: "monitoring". This makes rendered resource names like monitoring-kube-prometheus-prometheus rather than prometheus-stack-kube-prometheus-prometheus. Be careful when changing this on an existing install — resource names will move.

Storage and retention¶

Prometheus retention and storage are the two settings most often wrong on first install.

prometheus:
  prometheusSpec:
    retention: 30d            # how long to keep samples
    retentionSize: ""         # set together with PVC size, e.g. "80GiB"
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: gp3
          accessModes: [ "ReadWriteOnce" ]
          resources:
            requests:
              storage: 100Gi

Rules of thumb:

Pick retention based on how far back you need to query during incidents (often 14–30 days).
Set retentionSize to ~80–90% of the PVC size to avoid the WAL filling the disk.
Use a StorageClass with volumeBindingMode: WaitForFirstConsumer and SSD performance.
Do not leave storageSpec unset in production — without it, Prometheus uses an emptyDir and you lose data on every pod restart.

Replicas and HA¶

prometheus:
  prometheusSpec:
    replicas: 2
    shards: 1                 # only increase if you have >1M active series
    podAntiAffinity: "soft"

alertmanager:
  alertmanagerSpec:
    replicas: 3               # 3 is the recommended HA size for gossip
    podAntiAffinity: "soft"

Two Prometheis run independently (each scrapes the same targets and stores its own copy of the data). For deduplicated long-term storage, integrate with Thanos / Cortex / Mimir.

Grafana¶

grafana:
  enabled: true
  adminPassword: "<rotate-me>"   # or grafana.admin.existingSecret
  defaultDashboardsEnabled: true
  persistence:
    enabled: true
    storageClassName: gp3
    size: 10Gi
  ingress:
    enabled: true
    ingressClassName: nginx
    hosts:
      - grafana.example.com
    tls:
      - secretName: grafana-tls
        hosts:
          - grafana.example.com
  additionalDataSources:
    - name: Loki
      type: loki
      url: http://loki-gateway.logging.svc:3100
      access: proxy

Ingress¶

Each ingress (Prometheus, Alertmanager, Grafana) has its own ingress.* block. Typical pattern:

prometheus:
  ingress:
    enabled: true
    ingressClassName: nginx
    hosts: [ "prometheus.example.com" ]
    tls:
      - secretName: prometheus-tls
        hosts: [ "prometheus.example.com" ]
    annotations:
      nginx.ingress.kubernetes.io/auth-type: basic
      nginx.ingress.kubernetes.io/auth-secret: prometheus-basic-auth

Prometheus and Alertmanager have no auth

The Prometheus and Alertmanager UIs ship with no authentication. Put them behind an authenticating ingress (basic-auth Secret, OAuth2-proxy, Cloudflare Access, or similar) or leave them unexposed and reach them via kubectl port-forward.

Alertmanager routing¶

alertmanager:
  enabled: true
  config:
    global:
      resolve_timeout: 5m
      slack_api_url: "<your-slack-webhook>"
    route:
      receiver: "slack-default"
      group_by: [ "alertname", "namespace" ]
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 4h
      routes:
        - matchers:
            - severity = "critical"
          receiver: "pagerduty"
          continue: true
    receivers:
      - name: "slack-default"
        slack_configs:
          - channel: "#alerts"
            send_resolved: true
            title: '{{ template "slack.default.title" . }}'
            text: '{{ template "slack.default.text" . }}'
      - name: "pagerduty"
        pagerduty_configs:
          - service_key: "<pd-routing-key>"

You can also leave the static config minimal and let teams manage their own routes via namespaced AlertmanagerConfig CRs.

Selecting custom rules and ServiceMonitors¶

By default the Operator only picks up ServiceMonitors and PrometheusRules that carry the chart's release labels. Two common requirements:

1. Pick up resources from any namespace, with any labels:

prometheus:
  prometheusSpec:
    serviceMonitorSelectorNilUsesHelmValues: false
    serviceMonitorSelector: {}
    serviceMonitorNamespaceSelector: {}

    podMonitorSelectorNilUsesHelmValues: false
    podMonitorSelector: {}
    podMonitorNamespaceSelector: {}

    ruleSelectorNilUsesHelmValues: false
    ruleSelector: {}
    ruleNamespaceSelector: {}

2. Only pick up resources tagged team=platform:

prometheus:
  prometheusSpec:
    serviceMonitorSelectorNilUsesHelmValues: false
    serviceMonitorSelector:
      matchLabels:
        team: platform

Resources¶

Always set Prometheus and Alertmanager resources for production. A starting point for a small cluster (~50 pods, ~10 nodes):

prometheus:
  prometheusSpec:
    resources:
      requests: { cpu: 500m, memory: 2Gi }
      limits:   { cpu: "2",  memory: 4Gi }

alertmanager:
  alertmanagerSpec:
    resources:
      requests: { cpu: 50m,  memory: 128Mi }
      limits:   { cpu: 200m, memory: 256Mi }

grafana:
  resources:
    requests: { cpu: 100m, memory: 256Mi }
    limits:   { cpu: 500m, memory: 512Mi }

Tune from there based on actual usage seen in process_resident_memory_bytes / container_cpu_usage_seconds_total.

Disabling components¶

Common reasons to disable parts of the stack:

# Managed control plane (EKS / GKE / AKS): scheduler/controller-manager/etcd
# are not user-reachable
kubeControllerManager: { enabled: false }
kubeScheduler:         { enabled: false }
kubeEtcd:              { enabled: false }

# CNI without kube-proxy (e.g. Cilium kube-proxy replacement)
kubeProxy: { enabled: false }

# Already running an external Grafana
grafana: { enabled: false }

# Already running CRDs from another release / operator
crds: { enabled: false }

When you disable a component, also disable the matching default rule group:

defaultRules:
  rules:
    etcd: false
    kubeControllerManager: false
    kubeScheduler: false
    kubeProxy: false

Otherwise you'll get spurious *Down alerts forever.

Image overrides for air-gapped clusters¶

global:
  imageRegistry: "my-internal-registry.example.com"
  imagePullSecrets:
    - name: registry-creds

prometheusOperator:
  image:
    registry: my-internal-registry.example.com
    repository: quay.io/prometheus-operator/prometheus-operator
    tag: v0.78.2

Each component (Prometheus, Alertmanager, Grafana, exporters) has its own image.* override block.