Run Mimir in production

Run Grafana Mimir in production using the Helm chart

In addition to the guide Get started with Grafana Mimir using the Helm chart, which covers setting up Grafana Mimir on a local Kubernetes cluster or within a low-risk development environment, you can prepare Grafana Mimir for production.

Although the information that follows assumes that you are using Grafana Mimir in a production environment that is customer-facing, you might need the high-availability and horizontal-scalability features of Grafana Mimir even in an internal, development environment.

Before you begin

Meet all the following prerequisites:

You are familiar with Helm 3.x.
Add the grafana Helm repository to your local environment or to your CI/CD tooling:
bash
```
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
```
You have an external object storage that is different from the MinIO object storage that mimir-distributed deploys, because the MinIO deployment in the Helm chart is only intended for getting started and is not intended for production use.
To use Grafana Mimir in production, you must replace the default object storage with an Amazon S3 compatible service, Google Cloud Storage, Microsoft® Azure Blob Storage, or OpenStack Swift. Alternatively, to deploy MinIO yourself, see MinIO High Performance Object Storage.
Note
Like Amazon S3, the chosen object storage implementation must not create directories. Grafana Mimir doesn’t have any notion of object storage directories, and so will leave empty directories behind when removing blocks. For example, if you use Azure Blob Storage, you must disable hierarchical namespace.
You have an external Apache Kafka or Kafka-compatible backend for production use. The Helm chart deploys a single-node Kafka cluster for demo purposes only and is not suitable for production.
The ingest storage is the next generation architecture of Grafana Mimir. With this architecture, the Mimir read and write paths are decoupled using an Apache Kafka or Kafka-compatible backend. To run Grafana Mimir in production you must configure Mimir with the credentials of a production-grade Kafka cluster.
Note
For backwards compatibility with the existing Mimir installations, the mimir-distributed Helm chart includes a classic-architecture preset, that deploys Grafana Mimir with ingest storage disabled.

Plan capacity

The mimir-distributed Helm chart comes with two sizing plans:

For 1M series: small.yaml
For 10M series: large.yaml

These sizing plans are estimated based on experience from operating Grafana Mimir at Grafana Labs. The ideal size for your cluster depends on your usage patterns. Therefore, use the sizing plans as a starting point for sizing your Grafana Mimir cluster, rather than as strict guidelines. To get a better idea of how to plan capacity, refer to the YAML comments at the beginning of small.yaml and large.yaml files, which relate to read and write workloads. See also Planning Grafana Mimir capacity.

To use a sizing plan, copy it from the mimir GitHub repository, and pass it as a values file to the helm command. Note that sizing plans may change with new versions of the mimir-distributed chart. Make sure to use a sizing plan from a version close to the version of the Helm chart that you are installing.

For example:

helm install mimir-prod grafana/mimir-distributed -f ./small.yaml

Conform to fault-tolerance requirements

As part of Pod scheduling, the small.yaml and large.yaml files add Pod anti-affinity rules so that no two ingester Pods, nor two store-gateway Pods, are scheduled on any given Kubernetes Node. This increases fault tolerance of the Mimir cluster.

You must create and add Nodes, such that the number of Nodes is equal to or larger than either the number of ingester Pods or the number of store-gateway Pods, whichever one is larger. Expressed as a formula, it reads as follows:

number_of_nodes >= max(number_of_ingesters_pods, number_of_store_gateway_pods)

For more information about the failure modes of either the ingester or store-gateway component, refer to Ingesters failure and data loss or Store-gateway: Blocks sharding and replication.

Decide whether you need geographical redundancy, fast rolling updates, or both.

You can use a rolling update strategy to apply configuration changes to Grafana Mimir, and to upgrade Grafana Mimir to a newer version. A rolling update results in no downtime to Grafana Mimir.

The Helm chart performs a rolling update for you. To make sure that rolling updates are faster, configure the Helm chart to deploy Grafana Mimir with zone-aware replication.

New installations

Grafana Mimir supports replication across availability zones within your Kubernetes cluster. This further increases fault tolerance of the Mimir cluster. Even if you do not currently have multiple zones across your Kubernetes cluster, you can avoid having to extraneously migrate your cluster when you start using multiple zones.

For mimir-distributed Helm chart v4.0 or higher, zone-awareness is enabled by default for new installations.

To benefit from zone-awareness, choose the node selectors for your different zones. For convenience, you can use the following YAML configuration snippet as a starting point:

ingester:
  zoneAwareReplication:
    enabled: true
    topologyKey: kubernetes.io/hostname
    zones:
      - name: zone-a
        nodeSelector:
          topology.kubernetes.io/zone: us-central1-a
      - name: zone-b
        nodeSelector:
          topology.kubernetes.io/zone: us-central1-b
      - name: zone-c
        nodeSelector:
          topology.kubernetes.io/zone: us-central1-c

store_gateway:
  zoneAwareReplication:
    enabled: true
    topologyKey: kubernetes.io/hostname
    zones:
      - name: zone-a
        nodeSelector:
          topology.kubernetes.io/zone: us-central1-a
      - name: zone-b
        nodeSelector:
          topology.kubernetes.io/zone: us-central1-b
      - name: zone-c
        nodeSelector:
          topology.kubernetes.io/zone: us-central1-c

Existing installations

If you are upgrading from a previous mimir-distributed Helm chart version to v4.0, then refer to the migration guide to configure zone-aware replication.

Configure Mimir to use object storage

For the different object storage types that Mimir supports, and examples, see Configure Grafana Mimir object storage backend.

Add the following YAML to your values file, if you are not using the sizing plans that are mentioned in Plan capacity:
yaml
```
minio:
  enabled: false
```
Prepare the credentials and bucket names for the object storage.

Add the object storage configuration to the Helm chart values. Nest the object storage configuration under mimir.structuredConfig. This example uses Amazon S3:

mimir:
  structuredConfig:
    common:
      storage:
        backend: s3
        s3:
          endpoint: s3.us-east-2.amazonaws.com
          region: us-east
          secret_access_key: "${AWS_SECRET_ACCESS_KEY}" # This is a secret injected via an environment variable
          access_key_id: "${AWS_ACCESS_KEY_ID}" # This is a secret injected via an environment variable

    blocks_storage:
      s3:
        bucket_name: mimir-blocks
    alertmanager_storage:
      s3:
        bucket_name: mimir-alertmanager
    ruler_storage:
      s3:
        bucket_name: mimir-ruler

    # The following admin_client configuration only applies to Grafana Enterprise Metrics deployments:
    #admin_client:
    #  storage:
    #    s3:
    #      bucket_name: gem-admin

Configure Mimir to use Kafka-compatible backend

Add the following YAML to your values file, if you are not using the classic architecture preset that is noted above:
yaml
```
kafka:
  enabled: false
```
The configuration disables deployment of the Apache Kafka cluster, that the Helm chart embeds. This single-node cluster is intended for demo purposes only, and isn’t recommended for production use-cases.

Add the credentials and configuration for your production Apache Kafka or Kafka-compatible cluster to the Helm chart values. Nest the configuration under the mimir.structuredConfig:

mimir:
  structuredConfig:
    ingest_storage:
      kafka:
        # Address of Kafka broker to bootstrap the connection
        address: kafka:9092
        # (optional) SASL credentials provisioned for communications with clients within the cluster
        sasl_username: "${KAFKA_SASL_USERNAME}" # This is a secret injected via an environment variable
        sasl_password: "${KAFKA_SASL_PASSWORD}" # This is a secret injected via an environment variable
        # Mimir will auto-create the topic on start up.
        # The topic MUST be provisioned with no fewer partitions than the maximum number of ingester replicas. The value of 1000 here is arbitrarily large to guarantee that.
        topic: mimir-ingest
        auto_create_topic_enabled: true
        auto_create_topic_default_partitions: 1000

Meet security compliance regulations

Grafana Mimir does not require any special permissions on the hosts that it runs on. Because of this, you can deploy it in environments that enforce the Kubernetes Restricted security policy.

In Kubernetes v1.23 and higher, the Restricted policy can be enforced via a namespace label on the Namespace resource where Mimir is deployed. For example:

pod-security.kubernetes.io/enforce: restricted

In Kubernetes versions prior to 1.23, the mimir-distributed Helm chart provides a PodSecurityPolicy resource that enforces many of the recommendations from the Restricted policy that the namespace label enforces. To enable the PodSecurityPolicy admission controller for your Kubernetes cluster, refer to How do I turn on an admission controller?.

For OpenShift-specific instructions see Deploy on OpenShift.

The mimir-distributed Helm chart also deploys most of the containers with a read-only root filesystem (readOnlyRootFilesystem: true). The exceptions are the optional MinIO and Grafana Agent (deprecated) containers. The PodSecurityPolicy resource enforces this setting.

Monitor the health of your Grafana Mimir cluster

To monitor the health of your Grafana Mimir cluster, which is also known as meta-monitoring, you can use ready-made Grafana dashboards, and Prometheus alerting and recording rules. For more information, see Installing Grafana Mimir dashboards and alerts.

Note
The Grafana Mimir Helm chart contains built-in configurations for meta-monitoring that use the Grafana Agent, which is now deprecated. We no longer recommend using this approach. Instead we recommend an approach that is based on Kubernetes Monitoring Helm chart.

Kubernetes Monitoring comes with the built-in Mimir integration, that collects metrics, logs, and traces from Grafana Mimir. It configures Grafana Alloy to handle all scraping and log collection automatically.

Use the meta-monitoring example in the Kubernetes Monitoring Helm chart for guidance. Update the destinations section to specify where to send the collected metrics, logs, and traces. Send metrics to Prometheus or a Prometheus-compatible backend such as another Mimir instance or Grafana Cloud Metrics. You can configure multiple metrics, logs, and/or traces destinations to forward data to several backends simultaneously.

For more meta-monitoring topics, refer to Monitor Grafana Mimir.

Deprecated meta-monitoring approach

The mimir-distributed Helm chart makes it easy for you to collect metrics and logs from Mimir. The chart uses the Grafana Agent to ship metrics to a Prometheus-compatible server and logs to a Loki or GEL (Grafana Enterprise Metrics) server.

Caution
Grafana Alloy is the new name for our distribution of the OTel collector. Grafana Agent has been deprecated and is in Long-Term Support (LTS) through October 31, 2025. Grafana Agent will reach an End-of-Life (EOL) on November 1, 2025. Read more about why we recommend migrating to Grafana Alloy.

Download the Grafana Agent Operator Custom Resource Definitions (CRDs) from https://github.com/grafana/agent/tree/main/operations/agent-static-operator/crds

Install the CRDs on your cluster:

kubectl apply -f operations/agent-static-operator/crds/

Add the following YAML snippet to your values file, to send metamonitoring telemetry from Mimir. Change the URLs and credentials to match your desired destination.

metaMonitoring:
  serviceMonitor:
    enabled: true
  grafanaAgent:
    enabled: true
    installOperator: true

    logs:
      remote:
        url: "https://example.com/loki/api/v1/push"
        auth:
          username: 12345

    metrics:
      remote:
        url: "https://prometheus.prometheus.svc.cluster.local./api/v1/push"
        headers:
          X-Scope-OrgID: metamonitoring

Your Grafana Mimir cluster can now ingest metrics in production.

Configure clients to write metrics to Mimir

To configure each client to remote-write metrics to Mimir, refer to Configure Prometheus to write to Grafana Mimir and Configure Grafana Alloy to write to Grafana Mimir.

Set up redundant Prometheus or Grafana Alloy instances for high availability

If you need redundancy on the write path before it reaches Mimir, then you can set up redundant instances of Prometheus or Grafana Alloy to write metrics to Mimir.

For more information, see Configure high-availability deduplication with Consul.

Deploy on OpenShift

To deploy the mimir-distributed Helm chart on OpenShift you need to change some of the default values. Add the following YAML snippet to your values file. This will create a dedicated SecurityContextConstraints (SCC) resource for the mimir-distributed chart.

rbac:
  create: true
  type: scc
  podSecurityContext:
    fsGroup: null
    runAsGroup: null
    runAsUser: null
rollout_operator:
  podSecurityContext:
    fsGroup: null
    runAsGroup: null
    runAsUser: null

Alternatively, to deploy using the default SCC in your OpenShift cluster, add the following YAML snippet to your values file:

rbac:
  create: false
  type: scc
  podSecurityContext:
    fsGroup: null
    runAsGroup: null
    runAsUser: null
rollout_operator:
  podSecurityContext:
    fsGroup: null
    runAsGroup: null
    runAsUser: null

Caution
In Helm versions 3.13 and earlier, you might experience a known issue overriding default values when using the mimir-distributed Helm chart as a dependency. To view examples and possible workarounds, refer to this issue on GitHub. If your specific situation isn’t addressed, open an issue in the Mimir repository.

Was this page helpful?

Email docs@grafana.com

Help and support

Community