Run Grafana Mimir in production using the Helm chart
In addition to the guide Get started with Grafana Mimir using the Helm chart, which covers setting up Grafana Mimir on a local Kubernetes cluster or within a low-risk development environment, you can prepare Grafana Mimir for production.
Although the information that follows assumes that you are using Grafana Mimir in a production environment that is customer-facing, you might need the high-availability and horizontal-scalability features of Grafana Mimir even in an internal, development environment.
Before you begin
Meet all the following prerequisites:
You are familiar with Helm 3.x.
Add the grafana Helm repository to your local environment or to your CI/CD tooling:
helm repo add grafana https://grafana.github.io/helm-charts helm repo update
You have an external object storage that is different from the MinIO object storage that
mimir-distributed
deploys, because the MinIO deployment in the Helm chart is only intended for getting started and is not intended for production use.To use Grafana Mimir in production, you must replace the default object storage with an Amazon S3 compatible service, Google Cloud Storage, Microsoftยฎ Azure Blob Storage, or OpenStack Swift. Alternatively, to deploy MinIO yourself, see MinIO High Performance Object Storage.
Note
Like Amazon S3, the chosen object storage implementation must not create directories. Grafana Mimir doesn’t have any notion of object storage directories, and so will leave empty directories behind when removing blocks. For example, if you use Azure Blob Storage, you must disable hierarchical namespace.
You have an external Apache Kafka or Kafka-compatible backend for production use. The Helm chart deploys a single-node Kafka cluster for demo purposes only and is not suitable for production.
The ingest storage is the next generation architecture of Grafana Mimir. With this architecture, the Mimir read and write paths are decoupled using an Apache Kafka or Kafka-compatible backend. To run Grafana Mimir in production you must configure Mimir with the credentials of a production-grade Kafka cluster.
Note
For backwards compatibility with the existing Mimir installations, the
mimir-distributed
Helm chart includes aclassic-architecture
preset, that deploys Grafana Mimir with ingest storage disabled.
Plan capacity
The mimir-distributed
Helm chart comes with two sizing plans:
- For 1M series:
small.yaml
- For 10M series:
large.yaml
These sizing plans are estimated based on experience from operating Grafana
Mimir at Grafana Labs. The ideal size for your cluster depends on your
usage patterns. Therefore, use the sizing plans as a starting
point for sizing your Grafana Mimir cluster, rather than as strict guidelines.
To get a better idea of how to plan capacity, refer to the YAML comments at
the beginning of small.yaml
and large.yaml
files, which relate to read and write workloads.
See also
Planning Grafana Mimir capacity.
To use a sizing plan, copy it from the mimir
GitHub repository, and pass it as a values file to the helm
command. Note that sizing plans may change with new
versions of the mimir-distributed
chart. Make sure to use a sizing plan from a version close to the version of the
Helm chart that you are installing.
For example:
helm install mimir-prod grafana/mimir-distributed -f ./small.yaml
Conform to fault-tolerance requirements
As part of Pod scheduling, the small.yaml
and large.yaml
files add Pod
anti-affinity rules so that no two ingester Pods, nor two store-gateway
Pods, are scheduled on any given Kubernetes Node. This increases fault
tolerance of the Mimir cluster.
You must create and add Nodes, such that the number of Nodes is equal to or larger than either the number of ingester Pods or the number of store-gateway Pods, whichever one is larger. Expressed as a formula, it reads as follows:
number_of_nodes >= max(number_of_ingesters_pods, number_of_store_gateway_pods)
For more information about the failure modes of either the ingester or store-gateway component, refer to Ingesters failure and data loss or Store-gateway: Blocks sharding and replication.
Decide whether you need geographical redundancy, fast rolling updates, or both.
You can use a rolling update strategy to apply configuration changes to Grafana Mimir, and to upgrade Grafana Mimir to a newer version. A rolling update results in no downtime to Grafana Mimir.
The Helm chart performs a rolling update for you. To make sure that rolling updates are faster, configure the Helm chart to deploy Grafana Mimir with zone-aware replication.
New installations
Grafana Mimir supports replication across availability zones within your Kubernetes cluster. This further increases fault tolerance of the Mimir cluster. Even if you do not currently have multiple zones across your Kubernetes cluster, you can avoid having to extraneously migrate your cluster when you start using multiple zones.
For mimir-distributed
Helm chart v4.0 or higher, zone-awareness is enabled by
default for new installations.
To benefit from zone-awareness, choose the node selectors for your different zones. For convenience, you can use the following YAML configuration snippet as a starting point:
ingester:
zoneAwareReplication:
enabled: true
topologyKey: kubernetes.io/hostname
zones:
- name: zone-a
nodeSelector:
topology.kubernetes.io/zone: us-central1-a
- name: zone-b
nodeSelector:
topology.kubernetes.io/zone: us-central1-b
- name: zone-c
nodeSelector:
topology.kubernetes.io/zone: us-central1-c
store_gateway:
zoneAwareReplication:
enabled: true
topologyKey: kubernetes.io/hostname
zones:
- name: zone-a
nodeSelector:
topology.kubernetes.io/zone: us-central1-a
- name: zone-b
nodeSelector:
topology.kubernetes.io/zone: us-central1-b
- name: zone-c
nodeSelector:
topology.kubernetes.io/zone: us-central1-c
Existing installations
If you are upgrading from a previous mimir-distributed
Helm chart version
to v4.0, then refer to the migration guide to configure
zone-aware replication.
Configure Mimir to use object storage
For the different object storage types that Mimir supports, and examples, see Configure Grafana Mimir object storage backend.
Add the following YAML to your values file, if you are not using the sizing plans that are mentioned in Plan capacity:
minio: enabled: false
Prepare the credentials and bucket names for the object storage.
Add the object storage configuration to the Helm chart values. Nest the object storage configuration under
mimir.structuredConfig
. This example uses Amazon S3:mimir: structuredConfig: common: storage: backend: s3 s3: endpoint: s3.us-east-2.amazonaws.com region: us-east secret_access_key: "${AWS_SECRET_ACCESS_KEY}" # This is a secret injected via an environment variable access_key_id: "${AWS_ACCESS_KEY_ID}" # This is a secret injected via an environment variable blocks_storage: s3: bucket_name: mimir-blocks alertmanager_storage: s3: bucket_name: mimir-alertmanager ruler_storage: s3: bucket_name: mimir-ruler # The following admin_client configuration only applies to Grafana Enterprise Metrics deployments: #admin_client: # storage: # s3: # bucket_name: gem-admin
Configure Mimir to use Kafka-compatible backend
Add the following YAML to your values file, if you are not using the classic architecture preset that is noted above:
kafka: enabled: false
The configuration disables deployment of the Apache Kafka cluster, that the Helm chart embeds. This single-node cluster is intended for demo purposes only, and isn’t recommended for production use-cases.
Add the credentials and configuration for your production Apache Kafka or Kafka-compatible cluster to the Helm chart values. Nest the configuration under the
mimir.structuredConfig
:mimir: structuredConfig: ingest_storage: kafka: # Address of Kafka broker to bootstrap the connection address: kafka:9092 # (optional) SASL credentials provisioned for communications with clients within the cluster sasl_username: "${KAFKA_SASL_USERNAME}" # This is a secret injected via an environment variable sasl_password: "${KAFKA_SASL_PASSWORD}" # This is a secret injected via an environment variable # Mimir will auto-create the topic on start up. # The topic MUST be provisioned with no fewer partitions than the maximum number of ingester replicas. The value of 1000 here is arbitrarily large to guarantee that. topic: mimir-ingest auto_create_topic_enabled: true auto_create_topic_default_partitions: 1000
Meet security compliance regulations
Grafana Mimir does not require any special permissions on the hosts that it runs on. Because of this, you can deploy it in environments that enforce the Kubernetes Restricted security policy.
In Kubernetes v1.23 and higher, the Restricted policy can be enforced via a namespace label on the Namespace resource where Mimir is deployed. For example:
pod-security.kubernetes.io/enforce: restricted
In Kubernetes versions prior to 1.23, the mimir-distributed
Helm chart
provides a PodSecurityPolicy resource
that enforces many of the recommendations from the Restricted policy that the
namespace label enforces.
To enable the PodSecurityPolicy admission controller for your Kubernetes
cluster, refer to
How do I turn on an admission controller?.
For OpenShift-specific instructions see Deploy on OpenShift.
The mimir-distributed
Helm chart also deploys most of the containers
with a read-only root filesystem (readOnlyRootFilesystem: true
).
The exceptions are the optional MinIO and Grafana Agent (deprecated) containers.
The PodSecurityPolicy resource enforces this setting.
Monitor the health of your Grafana Mimir cluster
To monitor the health of your Grafana Mimir cluster, which is also known as meta-monitoring, you can use ready-made Grafana dashboards, and Prometheus alerting and recording rules. For more information, see Installing Grafana Mimir dashboards and alerts.
Note
The Grafana Mimir Helm chart contains built-in configurations for meta-monitoring that use the Grafana Agent, which is now deprecated. We no longer recommend using this approach. Instead we recommend an approach that is based on Kubernetes Monitoring Helm chart.
Kubernetes Monitoring comes with the built-in Mimir integration, that collects metrics, logs, and traces from Grafana Mimir. It configures Grafana Alloy to handle all scraping and log collection automatically.
Use the meta-monitoring example in the Kubernetes Monitoring Helm chart for guidance. Update the destinations
section to specify where to send the collected metrics, logs, and traces. Send metrics to Prometheus or a Prometheus-compatible backend such as another Mimir instance or Grafana Cloud Metrics. You can configure multiple metrics, logs, and/or traces destinations to forward data to several backends simultaneously.
For more meta-monitoring topics, refer to Monitor Grafana Mimir.
Deprecated meta-monitoring approach
The mimir-distributed
Helm chart makes it easy for you to collect metrics and
logs from Mimir. The chart uses the Grafana Agent to ship metrics to
a Prometheus-compatible server and logs to a Loki or GEL (Grafana Enterprise
Metrics) server.
Caution
Grafana Alloy is the new name for our distribution of the OTel collector. Grafana Agent has been deprecated and is in Long-Term Support (LTS) through October 31, 2025. Grafana Agent will reach an End-of-Life (EOL) on November 1, 2025. Read more about why we recommend migrating to Grafana Alloy.
Download the Grafana Agent Operator Custom Resource Definitions (CRDs) from https://github.com/grafana/agent/tree/main/operations/agent-static-operator/crds
Install the CRDs on your cluster:
kubectl apply -f operations/agent-static-operator/crds/
Add the following YAML snippet to your values file, to send metamonitoring telemetry from Mimir. Change the URLs and credentials to match your desired destination.
metaMonitoring: serviceMonitor: enabled: true grafanaAgent: enabled: true installOperator: true logs: remote: url: "https://example.com/loki/api/v1/push" auth: username: 12345 metrics: remote: url: "https://prometheus.prometheus.svc.cluster.local./api/v1/push" headers: X-Scope-OrgID: metamonitoring
Your Grafana Mimir cluster can now ingest metrics in production.
Configure clients to write metrics to Mimir
To configure each client to remote-write metrics to Mimir, refer to Configure Prometheus to write to Grafana Mimir and Configure Grafana Alloy to write to Grafana Mimir.
Set up redundant Prometheus or Grafana Alloy instances for high availability
If you need redundancy on the write path before it reaches Mimir, then you can set up redundant instances of Prometheus or Grafana Alloy to write metrics to Mimir.
For more information, see Configure high-availability deduplication with Consul.
Deploy on OpenShift
To deploy the mimir-distributed
Helm chart on OpenShift you need to change some of the default values.
Add the following YAML snippet to your values file.
This will create a dedicated SecurityContextConstraints (SCC) resource for the mimir-distributed
chart.
rbac:
create: true
type: scc
podSecurityContext:
fsGroup: null
runAsGroup: null
runAsUser: null
rollout_operator:
podSecurityContext:
fsGroup: null
runAsGroup: null
runAsUser: null
Alternatively, to deploy using the default SCC in your OpenShift cluster, add the following YAML snippet to your values file:
rbac:
create: false
type: scc
podSecurityContext:
fsGroup: null
runAsGroup: null
runAsUser: null
rollout_operator:
podSecurityContext:
fsGroup: null
runAsGroup: null
runAsUser: null
Caution
In Helm versions 3.13 and earlier, you might experience a known issue overriding default values when using the
mimir-distributed
Helm chart as a dependency. To view examples and possible workarounds, refer to this issue on GitHub. If your specific situation isn’t addressed, open an issue in the Mimir repository.