Download page Monitor Cluster Health.
Monitor Cluster Health
The Cluster Dashboard provides visibility into your cluster's health to quickly see if any part of the cluster is impacting performance. Each Dashboard indicator helps to provide a different aspect of the cluster performance.
- In many cases, the cluster Inventory Dashboard might display 0 (zero) for Master Nodes. This is the expected behavior. Unlike on-premises clusters, cloud providers develop and release at a different pace compared to the Kubernetes project and evolve independently. This master node or a number of master nodes in Amazon EKS, AKS and other cloud-based environments or clusters hide the master node of the cluster. The results that are reported on the cluster are in line with the
kubectl get nodescommand.
To verify if what we're reporting is correct, run a command called
kubectl get nodes. This command gives node information, and they see master nodes shown as zero.
- The cluster-level resource utilization metrics are the sum of resources consumed by each of the pods. These metric values go up to hundreds of thousand because the Cluster Agent reports sums for individual pods.
- The pod utilization metrics are a sum of each of the containers running within the pod.
- The pod count and pod state (running pending, evicted, failed pods) metrics show real-time values and not historical values averaged over a designated period of time. Whatever pods are running "now" are what is reported.
Use the Clusters Dashboard
To access your cluster, from the Controller:
- To access Click: Servers > Clusters > Cluster Name
- Select a Cluster Agent, and double-click. The cluster interface shows the Dashboard, Pods, and Inventory tabs.
- ERRORS: The errors card
shows pie chart graphs for the monitored namespaces in each cluster:
- Errors: The number of Errors (Error events count), Evictions (Evicted pods count), and Threats (Eviction threats count) for individual pods.
- PODS BY PHASE: The number of Pods that are in various states: Failed, Pending, Running, Succeeded, and Unknown.
- ACTIVITY OVER PERIOD: A time-series chart that shows the number of pods in Running, and Pending states for a given period of time.
- CLUSTER CAPACITY: The cluster capactiy score bar shows CPU, Memory, and Pods. A green line indicates the capacity usage. The Dashboard indicates the percent usage of CPU, Memory and Pod capacity of the cluster, which helps you plan the resource capacity for this cluster.
- ISSUES: The issues
- Pod Issues: When a Cluster Agent observes Pod restarts and errors.
- Image Issues: Image pulling and errors.
- Storage: Storage capacity issues such as Errors and Quota Violations.
- UTILIZATION: The utilization
- CPU: Requests, Limits and Used.
- Memory: Requests, Limits and Used.
- PVCs: Requests and Capacity.
- QUOTAS: The quotas
card shows % utilization of resources relative to the respective quotas. The agent tracks the following resources: Here you can see the overall percentage for these indicators:
- Percentage of CPU Limit Quota Used
- Memory Limit of Quota Used
- Percentage of PVC Quota Used
- Percentage of CPU Request Quota Used
- Percentage of Memory Request Quota Used
- Percentage of Storage Quota Used
The numbers are cumulative for the entire cluster. These indicators can help track the availability of specific resources based on the imposed quotas and can be used in cluster capacity planning.
The Pods tab shows pods in various states and shares a high-level summary of their status. The example shown is for pods running in Amazon EKS. All pods are displayed based on their registered Namespace and Pod Name.
Terminated pods continue to be displayed in the Pods list until they have been purged from the AppDynamics Controller, but their metrics will not be updated. Purging happens automatically at regular intervals. See Controller Settings for the Cluster Agent.
The top card shows a summary of the monitored pods and their status in each cluster
- Total Pods: The total number of pods in the monitored cluster
- Running: The percentage of pods in a running state.
- Pending: The percentage of pods in a pending state. Pending status normally indicates an issue. For more information see the Kubernetes documentation.
- Evicted: The percentage of evicted pods.
- Failed: The percentage of failed pods.
- You can search based on Namespace or Pod Name.
- You can further filter based on pod tags and labels.
- Double click on any pod to go to the Pod Details screen. In the Pod Details screen, you can see the containers running in that pod, pod events, and pod labels/tags.
- The Cluster Agent automatically detects a pod's:
- Pod Name
- Container IDs
- Cpu % (Sum of running containers within the pod)
- Memory MB (Sum of running containers within the pod)
Pod Details Screen
The Pod Details screen displays
- Namespace: The namespace is in which namespace the pod is running.
- Hostname: The hostname is a namespace/pod name.
- Pod Events: A list of the most-recent events from the
- Container (count): A list of running containers displayed by Container ID in this pod. Clicking on each container will show individual container metrics.
- Tags: The Kubernetes labels provided to this pod.
If you click on the Container ID, it expands to show two container metrics, CPU and Memory Usage.
The Inventory tab displays a high-level snapshot or inventory view of your cluster. It displays the contents of the cluster and allows users to troubleshoot applications running in the cluster.
The Cluster card shows:
- Cluster Name
- Kubernetes Version
- Cluster ID
- Disk Pressure
- Memory Pressure
PODS: The card shows:
- Pods by Phases: Whether a pod is running, evicted, pending or failed.
- Privileged Pods: The pods run as root.
- No Limits: You can specify limits to any pod that you are starting, this metric will tell you how many pods don't have a limit defined.
- No Readiness Probe: If you've configured a probe in Kubernetes to monitor readiness the values will be displayed here.
- No Liveness Probe: If you've configured a probe in Kubernetes to monitor liveness the values will be displayed here.
- Missing Dependencies - Config Maps & Secrets: If a pod is dependent on any Config Maps & Secrets, then those dependencies are missing.
- Missing Dependencies - Services: If a pod is dependent on any Services, then those dependencies are missing.
- Scaledowns: You can scale down your deployments and replica sets. The count of scaledowns.
- Pod Kills: The number of pods that were killed.
NAMESPACES: A list of namespaces that you can search using the search bar.
OBJECTS: The objects card shows:
SERVICES: The services card shows the health of the entire cluster. Here you can see the overall percentage for these indicators: