All about NexClipper’s observability architecture

What is NexClipper?

NexClipper is an OSS-based (i.e. Prometheus) observability solution, providing a metric dashboard and log/trace explore features as main functions to support responsive resolution activities. Easy installation, operation automation, and continuous expansion of exporters for free guarantee low operation cost. Let’s take a deep dive into NexClipper’s application architecture below!

Figure 1. Application architecture of NexClipper

NexClipper’s server consists of the following intuitive components: Guided Dashboard, Alert Hub, Incident Management, Group/User/Channel Management, Operation Management & Automation, and Billing & Payment Management.

In addition, NexClipper’s OSS, Klevr Server, is installed with NexClipper provideing a Helm script to install a sudoRy client in the target Kubernetes cluster. 

After installing the sudoRy client on the customers’ cluster, customers are able to run the dashboard immediately by installing and setting various OSS projects including Prometheus, Grafana and more.

ExporterHub, NexClipper’s exporter-aid platform, helps customers to install additional exporters for services not only monitoring metrics but also curated alert rules and Grafana dashboard to monitor the metrics.

Key application stacks

Proven OSS projects

NexClipper consists of best-in-class OSS projects that have been proven in their respective field:

  • Prometheus is an open-source monitoring solution and graduate project of CNCF that is widely used in the cloud-native industry and the Kubernetes ecosystem.
  • Grafana is a de-facto standard for open-source monitoring, offering customizable dashboards with visualization tools as well as support for a wide range of databases.
  • Grafana Loki & Tempo are proven tools to store and manage logs and trace with scales.
  • OpenTelemetry provides a single, open-source standard and set of technologies to capture and export traces from the cloud-native applications and infrastructure of users.

In addition, NexClipper’s OSS, sudoRy (distributed resource management), NexClipper Cron (scheduler tool for APIs based on node-scheduler), and DS Switch (high-availability aid-tool for Prometheus) are making up for the shortcomings of open source projects by improving user experience and efficiency.

In this architecture, all OSS components, that are installed in the customer site, are well configured so that the system can be operated even if the subscription of NexClipper is stopped at a later point.

Long-term storage

NexClipper uses Cortex for the long-term storage of Prometheus. Cortex provides horizontally scalable, highly available, multi-tenant, long term storage for Prometheus and will be installed on the customer’s Kubernetes cluster. In the near future, we will also offer the option to choose cloud platforms in order to install Cortex outside of the customers’ clusters.

Log and trace

To identify the root-cause of incidents – the core goal of observability – NexClipper helps users to collect and analyze application logs using Loki. It further utilizes OpenTelemetry and Tempo to enable trace collection and analysis between distributed microservices. For log collection, customers will install Promtail, a log processor to feed the logs to Loki.

In order to collect traces, users can create and feed trace data though OpenTelemetry Library in microservices. The Trace_id for linking log and trace is already set by NexClipper and with the Explore option in Grafana, users can easily analyze the correlation between log and trace.

For a detailed guide on how to create and use logs and traces in NexClipper, please refer to the NexClipper Document Page.

Figure 2. Detailed architecture of log & trace in NexClipper

MetricOps for Alert Hub, incident management, group/user/channel management and anomaly management

NexClipper’s observability provides more sophisticated and intelligent notifications in order to ensure practical help when it comes to problem solving. This includes a 360-degree view of incidents, and ultimately will include suggestions for solutions.

NexClipper collects alerts every minute and evaluates the alerts for selected targets using the Bayesian network model to forecast anomalies to notify customers so that they can arrange proper actions to resolve possible incidents. Our solution also suggest proactive actions with NexClipper’s Kubernetes executor sudoRy, in order to automate the deployment of resolution actions. The official release of MeticOps is scheduled for the 2022 Fall.

Figure 3. Anomaly evaluation by alert rules

ExporterHub

ExporterHub, NexClipper’s exporter-aid platform, has been developed to provide information about best-practice exporters to customers and communities. Among over 10,000 exporters on Github, ExporterHub selects qualified exporters and provides them through continuous curation with an introduction to key metrics, alert rules, Grafana dashboards, and values for the Helm chart to install exporters. NexClipper Observability users can automate the installation of corresponding exporters, the alert configurations, and the Grafana dashboard directly via the user interface. NexClipper aims to continue to review and provide qualified exporters with best practice alert configuration and dashboard.

Guided Dashboard with Grafana

NexClipper provides observability dashboards in connection with Grafana’s dashboards in order to use the dashboards with guided tour manners while maximizing the OSS advantages of Grafana.

NexClipper’s guided dashboards provide a bird-eye view on system topology with health status information so that users can see everything that is happening at a glance. Further, a hierarchical list and the status of nodes and microservices under a cluster are displayed. Detailed monitoring can then be done with a link to open the Grafana dashboard.

Figure 4. Guided Dashboard

Distributed Kubernetes Executor – sudoRy

sudoRy is responsible for remotely managing distributed Kubernetes clusters. This function is essential for low-cost and error-free management of distributed IT resources in the cloud native environment. Installation, upgrade, and continuous operation of NexClipper are automatically performed through a predefined service catalog and can be executed regardless of the type and size of the target.

Image 5 sudoRy application architecture

This concludes our introduction to NexClipper’s observability architecture. If you would like to discuss more about this topic, please contact support@nexclipper.io