Download PDF
Download page Troubleshoot the Cluster Agent.
Troubleshoot the Cluster Agent
If the Cluster Agent is not reporting metrics for certain containers, pods, or nodes it may be due to a problem with the Kubernetes Metrics Server. If metrics are not reported by the Metrics Server, the Cluster Agent is unable to report them.
To verify that the Metrics Server is sending metrics, run this command from your cluster's master node:
$ kubectl get --raw /apis/metrics.k8s.io/v1beta1/pods
If the output of the command does not show metrics for the container in question, then the problem is most likely to be with the Metrics Server. Example output from the Metrics Server:
{
"kind":"PodMetricsList",
"apiVersion":"metrics.k8s.io/v1beta1",
"metadata":{
"selfLink":"/apis/metrics.k8s.io/v1beta1/pods"
},
"items":[
{
"metadata":{
"name":"replicaset-test-cjnsc",
"namespace":"test-qe",
"selfLink":"/apis/metrics.k8s.io/v1beta1/namespaces/test-qe/pods/replicaset-test-cjnsc",
"creationTimestamp":"2019-09-23T10:24:46Z"
},
"timestamp":"2019-09-23T10:23:38Z",
"window":"30s",
"containers":[
{
"name":"appagent",
"usage":{
"cpu":"1667384n",
"memory":"258672Ki"
}
}
]
}
]
}
The Metrics Server will log any issues while collecting node/pod/container metrics. You can get the logs for Metric Server by running:
$ kubectl logs <metric-server pod name> -n <namespace for metric-server(default value is: "kube-system")> --tail <number of required lines of logs>
For example:
$ kubectl logs metrics-server-6764b987d-mtn7g -n kube-system --tail 20
The returned Metric Server logs may contain the reason for not being able to collect metrics. For example:
E0920 11:44:54.204075 1 reststorage.go:147] unable to fetch pod metrics for pod test-qe/replicaset-test-9k7rl: no metrics known for pod
E0920 11:44:54.204080 1 reststorage.go:147] unable to fetch pod metrics for pod test/replicaset1-458-g9n2d: no metrics known for pod
E0920 11:44:54.204089 1 reststorage.go:147] unable to fetch pod metrics for pod kube-system/kube-proxy-t54rc: no metrics known for pod
E0920 11:45:19.188033 1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:ip-111.111.111.111: unable to fetch metrics from Kubelet ip-111.111.111.111 (111.111.111.111): Get https://111.111.111.111:2222/stats/summary/: dial tcp 111.111.111.111:2222: i/o timeout
Cluster Agent Restarts
Over time, if the Cluster Agent restarts, you can verify that a restart happened by getting the pod details using the command:
kubectl get pods -n appdynamics
Sample output:
NAME READY STATUS RESTARTS AGE
appdynamics-operator-6fff76b466-qtx57 1/1 Running 0 4h18m
k8s-cluster-agent-perf-jg-6fc498d557-q7zst 1/1 Running 1 83m
If the cluster agent unexpectedly restarts, the RESTARTS count value will be > 0. You will have to explicitly set namespaces again, and the logs because they will be reset. We strongly recommend that you do not set the write-to-stdout
property to false
in the cluster-agent.yaml
file. If this property is set to false
, the kubectl logs
command does return logs.
Cluster Agent logs persist even if the Cluster Agent is restarted by Kubernetes. To see the Cluster Agent logs for the cluster agent pod which restarted, use the following command:
kubectl -n appdynamics logs --previous ${CLUSTER_AGENT_POD_NAME}
If the Cluster Agent pod has restarted, the monitored namespaces configured through the User Interface (UI) will not be preserved. If you are configuring namespaces from UI, it is recommended to add the same namespaces to your cluster-agent.yaml
file under nsToMonitor
and then apply the same. So when the Cluster Agent pods restart, it will retain the monitored namespaces.
If you have not added namespaces to the cluster-agent.yaml
file, to reconfigure your monitored namespaces:
- Go to Appdynamics Agents > Cluster Agents > {CLUSTER_AGENT} > Configure
- Add the namespaces to monitor again.
For more information, see the Add or Remove Namespaces section on Administer.
APM Correlation on OpenShift 4.x
CRI-O is the default container runtime on Red Hat Openshift 4.x. If you're using APM agents with OpenShift 4.x, you'll have to update the UNIQUE_HOST_ID to support the syntax required for CRI-O containers. This setting applies to both new and existing application containers. If you have app agents running, you'll have to modify your app agent YAML file.
To run app agents with APM correlation on Openshift 4.x:
Open your app agent YAML file.
Locate the
spec: > args:
sectionUpdate the
UNIQUE_
HOST_ID
argument in the containers spec using the following example as a guide:spec: containers: - name: client-api command: ["/bin/sh"] args: ["-c", "UNIQUE_HOST_ID=$(sed -rn '1s#.*/##; 1s/(.{12}).*/\\1/p' /proc/self/cgroup) && java -Dappdynamics.agent.uniqueHostId=$UNIQUE_HOST_ID $JAVA_OPTS -jar /java-services.jar"] envFrom: - configMapRef: name: agent-config
CODEIf APM Correlation is working correctly, when you click on the Pod Details link, the link opens the APM node dashboard for that node.
Cluster Agents or Pods Are Not Seen in the Controller
If agents or pods are not visible in the Controller, or if agents or pods are not registered and reporting, see See the
sim.cluster.agent.limit
andsim.cluster.pod.limit
descriptions under Controller Settings for the Cluster Agent.Need additional help? Technical support is available through AppDynamics Support.