Exporter Review: PostgreSQL

In this edition of our exporter review series, we are introducing PostgresSQL, one of the best-fit exporters for monitoring metrics used by NexClipper. Read on to find out the exporter’s most important metrics, recommended alert rules, as well as the related Grafana dashboard and Helm Chart.

About PostgreSQL

PostgreSQL is an open-source object-relational database system with a strong reputation for reliability, feature robustness, and performance. It uses the SQL language combined with many features that safely store and scale the most complicated data workloads. 

PostgreSQL runs on all major operating systems and is offered as a service by all major cloud providers. It comes with many features aimed at helping developers build applications as well as helping administrators protect data integrity and create fault-tolerant environments. It supports users in managing their data no matter how big or small the dataset.

Since databases are such a critical resource, downtime can cause significant financial and reputation losses, so monitoring is a must. The Postgres exporter is required to monitor and expose Postgres metrics. It queries Postgres, scraps the data, and exposes the metrics to a Kubernetes service endpoint that can further be scrapped by Prometheus to ingest the time series data. For monitoring of Postgres, an external Prometheus exporter is used, which is maintained by the Prometheus Community. On deployment, this exporter scraps sizable metrics from Postgres and helps users get crucial and continuous information about the database which is difficult to extract from PostgreSQL directly. 

For this setup, we are using Bitnami PostgreSQL Helm charts to start the Postgres server.

How do you set up an exporter for Prometheus?

With the latest version of Prometheus (2.33 as of February 2022), these are the ways to set up a Prometheus exporter: 

Method 1 – Native

Supported by Prometheus since the beginning
To set up an exporter in native way a Prometheus config needs to be updated to add the target.
A sample configuration:

# scrape_config job
scrape_configs:

  - job_name: postgres
    scrape_interval: 45s
    scrape_timeout:  30s
    metrics_path: "/metrics"
    static_configs:
    - targets:
      - <postgres exporter endpoint>
Method 2 – Service Discovery

This method is applicable for Kubernetes deployment only.
With this, a default scrap config can be added to the prometheus.yaml file and an annotation can be added to the exporter service. With this, Prometheus will automatically start scrapping the data from the services with the mentioned path.

Prometheus.yaml

     - job_name: kubernetes-services   
        scrape_interval: 15s
        scrape_timeout: 10s
        kubernetes_sd_configs:
        - role: service
        relabel_configs:
        # Example relabel to scrape only endpoints that have
        # prometheus.io/scrape: "true" annotation.
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        #  prometheus.io/path: "/scrape/path" annotation.
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        #  prometheus.io/port: "80" annotation.
        - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
          action: replace
          target_label: __address__
          regex: (.+)(?::\d+);(\d+)
          replacement: $1:$2

Exporter service annotations:

 annotations:
    prometheus.io/path: /metrics
    prometheus.io/scrape: "true"
Method 3 – Prometheus Operator

Setting up a service monitor
The Prometheus operator supports an automated way of scraping data from the exporters by setting up a service monitor Kubernetes object. For reference, a sample service monitor for PostgreSQL can be found here.
These are the necessary steps:

Step 1

Add/update Prometheus operator’s selectors. By default, the Prometheus operator comes with empty selectors which will select every service monitor available in the cluster for scrapping the data.

To check your Prometheus configuration:

Kubectl get prometheus -n <namespace> -o yaml

A sample output will look like this.

ruleNamespaceSelector: {}
    ruleSelector:
      matchLabels:
        app: kube-prometheus-stack
        release: kps
    scrapeInterval: 1m
    scrapeTimeout: 10s
    securityContext:
      fsGroup: 2000
      runAsGroup: 2000
      runAsNonRoot: true
      runAsUser: 1000
    serviceAccountName: kps-kube-prometheus-stack-prometheus
    serviceMonitorNamespaceSelector: {}
    serviceMonitorSelector:
      matchLabels:
        release: kps

Here you can see that this Prometheus configuration is selecting all the service monitors with the label release = kps

So with this, if you are modifying the default Prometheus operator configuration for service monitor scrapping, make sure you use the right labels in your service monitor as well.

Step 2

Add a service monitor and make sure it has a matching label and namespace for the Prometheus service monitor selectors (serviceMonitorNamespaceSelector & serviceMonitorSelector).

Sample configuration:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  annotations:
    meta.helm.sh/release-name: postgres-exporter
    meta.helm.sh/release-namespace: monitor
  labels:
    app: prometheus-postgres-exporter
    app.kubernetes.io/managed-by: Helm
    chart: prometheus-postgres-exporter-1.1.0
    heritage: Helm
    release: kps
  name: prometheus-postgres-exporter
  namespace: monitor
  spec:
  endpoints:
  - interval: 15s
    port: postgres-exporter
  selector:
    matchLabels:
      app: prometheus-postgres-exporter
      release: postgres-exporter

As you can see, a matching label on the service monitor release = kps is used that is specified in the Prometheus operator scrapping configuration.

Metrics

The following ones are handpicked metrics that will provide insights into PostgreSQL.

  1. PG is up
    This shows whether the last scrape of metrics from PostgreSQL was able to connect to the server
    ➡ The key of the exporter metric is “pg_up”
    ➡ The value of the metric is a boolean –  1 or 0 which symbolizes if PostgreSQL is up or down respectively (1 for yes, 0 for no) 
  1. Replication lag
    In scenarios with replicated PostgreSQL servers, a high replication lag rate can lead to coherence problems if the master goes down.
    ➡ The metric key is “pg_replication_lag”
    ➡ The value will be in seconds
  1.  Too many connections
    By default, PostgreSQL supports 115 concurrent connections – 15 for superusers and 100 connections for other users. However, you can increase the maximum number of connections in PostgreSQL to support greater concurrency. If there are too many concurrent connections to the PostgreSQL database, it might give the error message “FATAL: sorry, too many clients already” and reject incoming connections.
    ➡ The metric “ pg_stat_activity_count” gives the total active connections on PostgreSQL
    ➡ The number should be calculated based on “pg_settings_max_connections” which is 100 by default
  1. Database size
    As the name suggests, the metric will give insight into the storage usage of each one of the PostgreSQL databases. 
    ➡ The meric “pg_database_size_bytes” shows storage used by each database
  1. Maximum transaction duration
    This metric provides information regarding latency and performance by calculating how much time it takes to get the results from the slowest active transaction.
    ➡ The metric  “pg_stat_activity_max_tx_duration” exposes maximum duration in seconds any active transaction has been running

Alerting

After digging into all the valuable metrics, this section explains in detail how we can get critical alerts.

PromQL is a query language for the Prometheus monitoring system. It is designed for building powerful yet simple queries for graphs, alerts, or derived time series (aka recording rules). PromQL is designed from scratch and has zero common grounds with other query languages used in time series databases, such as SQL in TimescaleDB, InfluxQL, or Flux. More details can be found here.

Prometheus comes with a built-in Alert Manager that is responsible for sending alerts (could be email, Slack, or any other supported channel) when any of the trigger conditions is met. Alerting rules allow users to define alerts based on Prometheus query expressions. They are defined based on the available metrics scraped by the exporter. Click here for a good source for community-defined alerts.

A general alert looks as follows:

– alert:(Alert Name)
expr: (Metric exported from exporter) >/</==/<=/=> (Value)
for: (wait for a certain duration between first encountering a new expression output vector element and counting an alert as firing for this element)
labels: (allows specifying a set of additional labels to be attached to the alert)
annotation: (specifies a set of informational labels that can be used to store longer additional information)

Some of the recommended PostgreSQL alerts are:

  1. Alert – PostgreSQL is down
- alert: PostgresqlDown
    expr: pg_up == 0
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Postgresql down (instance {{ $labels.instance }})
      description: "Postgresql instance is down\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
  1. Alert – Replication lag
  - alert: PostgresqlReplicationLag
    expr: pg_replication_lag > 30 and ON(instance) pg_replication_is_replica == 1
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Postgresql replication lag (instance {{ $labels.instance }})
      description: "PostgreSQL replication lag is going up (> 30s)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
  1.  Alert – Too many connections
 - alert: PostgresqlTooManyConnections
    expr: sum by (datname) (pg_stat_activity_count{datname!~"template.*|postgres"}) > pg_settings_max_connections * 0.8
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Postgresql too many connections (instance {{ $labels.instance }})
      description: "PostgreSQL instance has too many connections (> 80%).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
  1.   Alert –  Database size
- alert: PostgresqlHighDbSize
    expr: pg_database_size_bytes / (1024 * 1024 * 1024)
 > 100  # this value depends on available disk size 
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Postgresql DB size is more than 100 GB (instance {{ $labels.instance }})
      description: "Postgresql DB size is more than 100 GB\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
  1.  Alert – Max transaction duration
- alert: PostgresqlTXDuration
    expr: pg_stat_activity_max_tx_duration{state="active"} > 2
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Postgresql active transaction takes more than 2 seconds to complete (instance {{ $labels.instance }})
      description: "PostgreSQL Postgresql active transaction takes more than 2 seconds\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

Dashboard

Graphs are easier to understand and more user-friendly than a row of numbers. For this purpose, users can plot their time series data in visualized format using Grafana.

Grafana is an open-source dashboarding tool used for visualizing metrics with the help of customizable and illustrative charts and graphs. It connects very well with Prometheus and makes monitoring easy and informative. Dashboards in Grafana are made up of panels, with each panel running a PromQL query to fetch metrics from Prometheus.
Grafana supports community-driven graphs for most of the widely used software, which can be directly imported to the Grafana Community.

NexClipper uses the PostgreSQL database by the Lucas Estienne dashboard, which is widely accepted and has a lot of useful panels.

What is a Panel?

Panels are the most basic component of a dashboard and can display information in various ways, such as gauge, text, bar chart, graph, and so on. They provide information in a very interactive way. Users can view every panel separately and check the value of metrics within a specific time range. 
The values on the panel are queried using PromQL, which is Prometheus Query Language. PromQL is a simple query language used to query metrics within Prometheus. It enables users to query data, aggregate and apply arithmetic functions to the metrics, and then further visualize them on panels.

Here are some examples of panels:

1. Database metrics and settings

2. Database statistics

Helm Chart

The exporter, alert rule, and dashboard can be deployed in Kubernetes using the Helm chart. The Helm chart used for deployment is taken from the Prometheus community, which can be found here.

Installing PostgreSQL Server

If your Postgres server is not up and ready you can start it using Helm:

$ helm repo add bitnami https://charts.bitnami.com/bitnami
$ helm install my-release bitnami/postgresql

Note that bitnami charts allow you to deploy a Postgres exporter as part of the Helm chart. You can enable it by adding “–set metrics.enabled=true”

Installing PostgreSQL Exporter
helm repo add Prometheus-community https://prometheus-community.github.io/helm-charts

helm repo update
helm install my-release prometheus-community/prometheus-postgres-exporter

Some of the common parameters that must be changed in the values file include: 

config:
  datasource:
    # Specify one of both datasource or datasourceSecret
    host:
    user: postgres
    # Only one of password, passwordSecret and pgpassfile can be specified
    password:
    # Specify passwordSecret if DB password is stored in secret.
    passwordSecret: {}

All these parameters can be tuned via the values.yaml file here.

Scrape the metrics

There are multiple ways to scrape the metrics as discussed above. In addition to the native way of setting up Prometheus monitoring, a service monitor can be deployed (if a Prometheus operator is being used) to scrap the data from the Postgres exporter. With this approach, multiple Postgres servers can be scrapped without altering the Prometheus configuration. Every Postgres exporter comes with its own service monitor.
In the above-mentioned chart, a service monitor can be deployed by turning it on from the values.yaml file here.

serviceMonitor:
  # When set true then use a ServiceMonitor to configure scraping
  enabled: false
  # Set the namespace the ServiceMonitor should be deployed
  # namespace: monitoring
  # Set how frequently Prometheus should scrape
  # interval: 30s
  # Set path to cloudwatch-exporter telemtery-path
  # telemetryPath: /metrics
  # Set labels for the ServiceMonitor, use this to define your scrape label for Prometheus Operator
  # labels:
  # Set timeout for scrape
  # timeout: 10s
  # Set of labels to transfer from the Kubernetes Service onto the target
  # targetLabels: []
  # MetricRelabelConfigs to apply to samples before ingestion
  # metricRelabelings: []
  # Set relabel_configs as per https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config
  # relabelings: []

Update the annotation section here if you are not using the Prometheus Operator.

service: 
  annotations:
    prometheus.io/path: /metrics
    prometheus.io/scrape: "true"

This concludes our discussion of the PostgreSQL exporter! If you have any questions, you can reach our team via support@nexclipper.io. Stay tuned for further exporter reviews and tips coming soon.