The purpose of this document is to offer you some best practices to monitor your Nutanix Cluster in order to produce an IT Weather User Service reflecting its current state.
The Nutanix virtual computing platform is a scalable, converged computing and storage system specifically designed to host and store virtual machines.
All nodes in a Nutanix cluster aggregate to provide a unified tiered storage pool and present virtual machines with seamless, resource access. A global, system architecture integrates each new node into the cluster, allowing you to scale the solution to meet the data needs of your infrastructure.
The basic unit for the cluster is a Nutanix node. Each node in the cluster runs a standard hypervisor and contains CPUs, memory, and local storage (SSDs / hard disks).
A Nutanix Controller virtual machine runs on each node, allowing for pooling of local storage across all nodes in the cluster.
Monitor the status and performance of your Nutanix Cluster by adding specific Nutanix service templates:
Monitor the general use of Nutanix Cluster Storage using Nutanix service templates that measure general usage:
A Nutanix Cluster Storage cluster consists of a Container grouping one or more Storage Pools. In the event of an alert from the ‘Nutanix-Cluster-StorageUsage’ service template, you can immediately get further insight into the underlying cause by additionally monitoring the Containers and Storage Pools with the following service templates:
Ensure the health of your Nutanix Blocks by using the following service template:
Also make sure the hypervisors of each Nutanix Node are in good health.
If you are using VMware technology, you can use the following service templates to monitor the health of your ESXs:
Once the monitoring is in place, you can create an IT Weather User Service.
The screenshot below is an example; we can see the different elements constituting a Nutanix Cluster including:
As shown in the screenshot below, the advantage of such a visual model is to be able, in case of an issue, to directly identify the “root cause” of a degradation of the Nutanix Cluster. The screenshot below clearly shows a degradation due to a Critical Alert from ‘Pool-Usage_1’ of ‘Container_1’: