Exporter Review: Redis

This edition of our exporter series discusses Redis, one of the best-fit exporters for monitoring metrics used by NexClipper. We introduce the exporter’s most important metrics, recommended alert rules, as well as the related Grafana dashboard and Helm Chart – so keep reading to learn all about the Redis exporter.

About Redis

Redis stands for Remote Dictionary Server and is an in-memory data structure store used as a database, cache, streaming engine, and message broker. It provides data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, geospatial indexes, and streams. Redis has built-in replication, Lua scripting, LRU eviction, transactions, and different levels of on-disk persistence, and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster.

A Redis exporter is required to monitor and expose Redis’ metrics. It queries Redis, scraps the data, and exposes the metrics to a Kubernetes service endpoint that can further be scrapped by Prometheus to ingest the time series data. For monitoring Redis, we use an external Prometheus exporter, which is maintained by the Prometheus Community. On deployment, this exporter scraps sizable metrics from Redis and helps users get crucial information that is difficult to get from Redis directly and continuously. 

For this setup, we are using bitnami redis Helm charts to start the Redis server/cluster. 

How do you set up an exporter for Prometheus?

With the latest version of Prometheus (2.33 as of February 2022), these are the ways to set up a Prometheus exporter: 

Method 1 – Basic

Supported by Prometheus since the beginning
To set up an exporter in the native way a Prometheus config needs to be updated to add the target.
A sample configuration:

# scrape_config job
scrape_configs:

  - job_name: redis
    scrape_interval: 45s
    scrape_timeout:  30s
    metrics_path: "/metrics"
    static_configs:
    - targets:
      - <redis exporter endpoint>

Sample config for multiple Redis hosts:

 ## config for the multiple Redis targets that the exporter will scrape
  - job_name: 'redis_exporter_targets'
    static_configs:
      - targets:
        - redis://first-redis-host:6379
        - redis://second-redis-host:6379
        - <and so on>
    metrics_path: /scrape
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: <REDIS-EXPORTER-HOSTNAME>:9121
Method 2 – Service Discovery

This method is applicable for Kubernetes deployment only.
A default scrap config can be added to the prometheus.yaml file and an annotation can be added to the exporter service. With this, Prometheus will automatically start scrapping the data from the services with the mentioned path.

Prometheus.yaml

    - job_name: kubernetes-services   
        scrape_interval: 15s
        scrape_timeout: 10s
        kubernetes_sd_configs:
        - role: service
        relabel_configs:
        # Example relabel to scrape only endpoints that have
        # prometheus.io/scrape: "true" annotation.
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        #  prometheus.io/path: "/scrape/path" annotation.
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        #  prometheus.io/port: "80" annotation.
        - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
          action: replace
          target_label: __address__
          regex: (.+)(?::\d+);(\d+)
          replacement: $1:$2

Exporter service annotations:

 annotations:
    prometheus.io/path: /metrics
    prometheus.io/scrape: "true"
Method 3 – Prometheus Operator

Setting up a service monitor
The Prometheus operator supports an automated way of scraping data from the exporters by setting up a service monitor Kubernetes object. For reference, a sample service monitor for Redis can be found here.
These are the necessary steps:

Step 1

Add/update Prometheus operator’s selectors. By default, the Prometheus operator comes with empty selectors which will select every service monitor available in the cluster for scrapping the data.

To check your Prometheus configuration:

Kubectl get prometheus -n <namespace> -o yaml

A sample output will look like this.

ruleNamespaceSelector: {}
    ruleSelector:
      matchLabels:
        app: kube-prometheus-stack
        release: kps
    scrapeInterval: 1m
    scrapeTimeout: 10s
    securityContext:
      fsGroup: 2000
      runAsGroup: 2000
      runAsNonRoot: true
      runAsUser: 1000
    serviceAccountName: kps-kube-prometheus-stack-prometheus
    serviceMonitorNamespaceSelector: {}
    serviceMonitorSelector:
      matchLabels:
        release: kps

Here you can see that this Prometheus configuration is selecting all the service monitors with the label release = kps

So with this, if you are modifying the default Prometheus operator configuration for service monitor scrapping, make sure you use the right labels in your service monitor as well.

Step 2

Add a service monitor and make sure it has a matching label and namespace for the Prometheus service monitor selectors (serviceMonitorNamespaceSelector & serviceMonitorSelector).

Sample configuration:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  annotations:
    meta.helm.sh/release-name: redis-exporter
    meta.helm.sh/release-namespace: monitor
  labels:
    app: prometheus-redis-exporter
    app.kubernetes.io/managed-by: Helm
    chart: prometheus-redis-exporter-1.1.0
    heritage: Helm
    release: kps
  name: redis-exporter-prometheus-redis-exporter
  namespace: monitor
  spec:
  endpoints:
  - interval: 15s
    port: redis-exporter
  selector:
    matchLabels:
      app: prometheus-redis-exporter
      release: redis-exporter

As you can see, a matching label on the service monitor release = kps is used that is specified in the Prometheus operator scrapping configuration.

Metrics

The following metrics are handpicked and will provide insights into Redis operations.

  1. Server is up
    As the name suggests, this metric will expose the state of the Redis process and whether it is up or down.
    ➡ The key of the exporter metric is “redis_up”
    ➡ The value of the metric is a boolean –  1 or 0 which symbolizes if Redis is up or down respectively (1 for yes, 0 for no) 
  1. Redis used memory
    Since Redis is an in-memory database, it is important to monitor the memory as full memory may cause data loss based on the maxmemory-policy configured.
    ➡ The key of the exporter metrics is “redis_memory_used_bytes”
    ➡ You can calculate the percentage based on “redis_total_system_memory_bytes” or “redis_memory_max_bytes
    ➡ The exporter must be started with “–include-system-metrics“ flag or the “REDIS_EXPORTER_INCL_SYSTEM_METRICS=true“ environment variable to show the “redis_total_system_memory_bytes” metric
    ➡ “redis_memory_max_bytes“ is 0 by default, to get the value you need to limit the memory used by Redis by running this command “CONFIG SET maxmemory <value>”
  1.  Too many connections
    Redis has specific maximum number of clients configured and if the number of connections exceeds this value it rejects new connections.
    ➡ The metric “ redis_connected_clients” gives the total connections on Redis
    ➡ The number should be calculated based on the max allowed clients from the metric “redis_config_maxclients”
    ➡ Rejected clients can be monitored with “redis_rejected_connections_total”
  1. Redis rejecting connections
    This means Redis has reached a value at which it rejects accepting new connections. 
    ➡ The metric “redis_rejected_connections_total” gives the total rejected connections by Redis
    ➡ The value of this is a number tells the average rate of the rejected connection based on the specified time
  1. Redis out of system memory
    This means Redis is going out of the host system memory it is hosted upon.
    ➡ “redis_memory_used_bytes” and “redis_memory_max_bytes” metrics provide the corresponding memory of Redis and the host
    ➡ The value of these metrics is returned as a number of bytes that can be calculated to process the data

Additionally, below are some of the metrics that are important for the Redis cluster:

  1. Redis missing master
    This means the master node is missing from the Redis cluster.
    ➡ “redis_instance_info{role=”master”}”  is the key to get details about the master
    ➡ The value of this metric is a number that should be greater than 1
  1. Redis disconnected slaves
    This means slaves are not connected to the master.
    ➡ The metrics “redis_connected_slaves” provides the value of connected slaves, so to get the value of disconnected we need to subtract the provided value from the total available slaves
    ➡ The value of this is a number
  1. Redis replication is broken
    This means that the replication is broken between master and replica.
    ➡ The key “redis_connected_slaves” gives the value of connected slaves – if we take the delta of a minute it will tell us the replication state
    ➡ The value of this is a number

Alerting

After digging into all the valuable metrics, this section explains in detail how we can get critical alerts.

PromQL is a query language for the Prometheus monitoring system. It is designed for building powerful yet simple queries for graphs, alerts, or derived time series (aka recording rules). PromQL is designed from scratch and has zero common grounds with other query languages used in time series databases, such as SQL in TimescaleDB, InfluxQL, or Flux. More details can be found here.

Prometheus comes with a built-in Alert Manager that is responsible for sending alerts (could be email, Slack, or any other supported channel) when any of the trigger conditions is met. Alerting rules allow users to define alerts based on Prometheus query expressions. They are defined based on the available metrics scraped by the exporter. Click here for a good source for community-defined alerts.

A general alert looks as follows:

– alert:(Alert Name)
expr: (Metric exported from exporter) >/</==/<=/=> (Value)
for: (wait for a certain duration between first encountering a new expression output vector element and counting an alert as firing for this element)
labels: (allows specifying a set of additional labels to be attached to the alert)
annotation: (specifies a set of informational labels that can be used to store longer additional information)

Some of the recommended Redit alerts are:

Alert – Redis is down

 - alert: RedisDown
    expr: redis_up == 0
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Redis down (instance {{ $labels.instance }})
      description: "Redis instance is down\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

Alert – Redis out of memory

# The exporter must be started with --include-system-metrics flag or REDIS_EXPORTER_INCL_SYSTEM_METRICS=true environment variable.
  - alert: RedisOutOfSystemMemory
    expr: redis_memory_used_bytes / redis_total_system_memory_bytes * 100 > 90
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Redis out of system memory (instance {{ $labels.instance }})
      description: "Redis is running out of system memory (> 90%)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

➡ Alert – Too many connections

- alert: RedisTooManyConnections
    expr: redis_connected_clients > 100
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Redis too many connections (instance {{ $labels.instance }})
      description: "Redis instance has too many connections\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

 Alert –  Redis rejecting connections

 - alert: RedisRejectedConnections
    expr: increase(redis_rejected_connections_total[1m]) > 0
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Redis rejected connections (instance {{ $labels.instance }})
      description: "Some connections to Redis has been rejected\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

➡ Alert – Redis out of system memory

# The exporter must be started with --include-system-metrics flag or REDIS_EXPORTER_INCL_SYSTEM_METRICS=true environment variable.
  - alert: RedisOutOfSystemMemory
    expr: redis_memory_used_bytes / redis_total_system_memory_bytes * 100 > 90
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Redis out of system memory (instance {{ $labels.instance }})
      description: "Redis is running out of system memory (> 90%)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

Redis missing master

 - alert: RedisMissingMaster
    expr: (count(redis_instance_info{role="master"}) or vector(0)) < 1
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Redis missing master (instance {{ $labels.instance }})
      description: "Redis cluster has no node marked as master.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

Redis disconnected slaves

 - alert: RedisDisconnectedSlaves
    expr: count without (instance, job) (redis_connected_slaves) - sum without (instance, job) (redis_connected_slaves) - 1 > 1
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Redis disconnected slaves (instance {{ $labels.instance }})
      description: "Redis not replicating for all slaves. Consider reviewing the redis replication status.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

Redis replication broken

 - alert: RedisReplicationBroken
    expr: delta(redis_connected_slaves[1m]) < 0
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Redis replication broken (instance {{ $labels.instance }})
      description: "Redis instance lost a slave\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

Dashboard

Graphs are easier to understand and more user-friendly than a row of numbers. For this purpose, users can plot their time series data in visualized format using Grafana.

Grafana is an open-source dashboarding tool used for visualizing metrics with the help of customizable and illustrative charts and graphs. It connects very well with Prometheus and makes monitoring easy and informative. Dashboards in Grafana are made up of panels, with each panel running a PromQL query to fetch metrics from Prometheus.
Grafana supports community-driven graphs for most of the widely used software, which can be directly imported to the Grafana Community.

NexClipper uses the Redis by the downager dashboard, which is widely accepted and has a lot of useful panels.

What is a Panel?

Panels are the most basic component of a dashboard and can display information in various ways, such as gauge, text, bar chart, graph, and so on. They provide information in a very interactive way. Users can view every panel separately and check the value of metrics within a specific time range. 
The values on the panel are queried using PromQL, which is Prometheus Query Language. PromQL is a simple query language used to query metrics within Prometheus. It enables users to query data, aggregate and apply arithmetic functions to the metrics, and then further visualize them on panels.

Here are some examples of panels:

Helm Chart

The exporter, alert rule, and dashboard can be deployed in Kubernetes using the Helm chart. The Helm chart used for deployment is taken from the Prometheus community, which can be found here.

Installing Redis cluster

If your Redis cluster is not up and ready, you can start it using Helm:

$ helm repo add bitnami https://charts.bitnami.com/bitnami
$ helm install my-release  bitnami/redis --set master.extraFlags={"--maxmemory 1gb"}

Note that bitnami charts allow you to start a Redis exporter as a side car for the Redis container. You can enable that by adding “–set metrics.enabled=true”

Installing Redis Exporter
helm repo add Prometheus-community https://prometheus-community.github.io/helm-charts

helm repo update
helm install my-release prometheus-community/prometheus-redis-exporter

Some of the common parameters that must be changed in the values file include: 

redisAddress: "redis://redis-master:6379"
Auth:
  enabled: true
  redisPassword: secretpassword 

All these parameters can be tuned via the values.yaml file here.

Scrape the metrics

There are multiple ways to scrape the metrics as discussed above. In addition to the native way of setting up Prometheus monitoring, a service monitor can be deployed (if a Prometheus operator is being used) to scrap the data from the Redis exporter. With this approach, multiple Redis servers can be scrapped without altering the Prometheus configuration. Every Redis exporter comes with its own service monitor.
In the above-mentioned chart, a service monitor can be deployed by turning it on from the values.yaml file here.

serviceMonitor:
  # When set true then use a ServiceMonitor to configure scraping
  enabled: true
  # Set the namespace the ServiceMonitor should be deployed
  # namespace: monitoring
  # Set how frequently Prometheus should scrape
  # interval: 30s
  # Set path to redis-exporter telemtery-path
  # telemetryPath: /metrics
  # Set labels for the ServiceMonitor, use this to define your scrape label for Prometheus Operator
  # labels:
  # Set timeout for scrape
  # timeout: 10s
  # Set relabel_configs as per https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config
  # relabelings: []
  # Set of labels to transfer on the Kubernetes Service onto the target.
  # targetLabels: []
  # metricRelabelings: []

Update the annotation section here if not using the Prometheus Operator.

service: 
  annotations:
    prometheus.io/path: /metrics
    prometheus.io/scrape: "true"

This concludes our review of the Redis exporter! If you have any questions, you can reach to us via support@nexclipper.io for further discussions. Stay tuned for more useful exporter reviews and other tips coming soon.