What is MetricOps?

MetricOps is an alert based incident prediction and resolution suggestion solution using the Bayesian Belief Network and machine learning, consisting of five major application components that are all part of our extended observability service. Additionally, it provides the capabilities to execute investigation and resolution actions through our Kubernetes/Helm/Http executor Sudory.

Intrigued? We’ve got you covered below!

MetricOps Components

AlertHub

Our AlertHub acts as a communicator for Prometheus alerts. Through a webhook, the hub receives the alerts and saves them with labels into a database. Users can access the alert depending on permissions for clusters, nodes, and services. Through communication channels, such as Slack, Email, or Webhook, alert notifications are sent to defined access groups.

Anomaly Detector

Our anomaly detection system uses Prometheus alert rules and machine learning based on the Bayesian Belief Network for your selected monitoring targets. The engine calculates an anomaly probability and requests the creation of a new incident ticket.

Incident Manager

Our incident manager takes care of incidents with system-generated or individually created actions or metrics. It manages severity, the status of the incident, and the resources in-charge of it while also helping system investigation.

Resolution Advisor

When the Anomaly Detector creates an incident ticket, the Resolution Advisor provides pre-defined actions based on alerts by monitoring target. These recommended actions are registered in the ticket. You can also pre-configure automatically executed actions that will be run as soon as they are attached to the ticket. And for the auto-execution we have our trusted Sudory!

Sudory

Our trusted executor of Kubernetes APIs, Helm commands, and Http services within Kubernetes clusters! Sudory can automatically execute actions using pre-defined service templates. The Sudory server outside the Kubernetes clusters requests the service, using the Sudory client within the targeted Kubernetes cluster. Through templates in the service catalogue, service requests are reusable.

Prometheus ecosystem & powerful MetricOps