Download PDF
Download page Troubleshoot the Cluster Agent.
Troubleshoot the Cluster Agent
This page describes steps to troubleshoot the Cluster Agent install. See Install the Cluster Agent and Validate the Cluster Agent Installation.
Troubleshoot a Cluster Agent Not Reporting to the Controller
If after installing the Cluster Agent the Cluster Dashboard does not appear in the Controller, it could be the result of a connectivity issue with the Controller.
Verify that a Server Visibility license is available. The Cluster Agent requires an available Server Visibility license to register successfully. See Cluster Agent Requirements and Supported Environments. From the Controller UI, check that a license is available under Administration/License/Account Usage.
Review the Cluster Agent events. If the Cluster Agent or Cluster Agent Operator fails to start, then review the events in the
appdynamics
namespace:kubectl -n appdynamics get events # to sort by most recent events: kubectl -n appdynamics get events --sort-by='.lastTimestamp'
BASHYou can review the Cluster Agent pod specification for additional events:
kubectl -n appdynamics get pod <cluster-agent-pod> -o yaml
BASHReview the Cluster Agent logs for errors regarding Controller communication. Open a command-line prompt and enter:
kubectl -n appdynamics logs <cluster-agent-pod-name>
BASHVerify the Cluster Agent configuration. The Cluster Agent checks for configuration changes once a minute. To verify that your configurations have been applied and that the Cluster Agent is using the new values, open a command-line prompt and enter:
kubectl -n appdynamics describe cm cluster-agent-mon cluster-agent-log cluster-agent-config
CODEVerify that the latest Cluster Agent is installed. If you are upgrading the Cluster Agent from a previous version, then the previous Operator YAML or image may not be compatible. You must reinstall the Cluster Agent Operator and Cluster Agent using the procedure described in Upgrade the Cluster Agent.
Troubleshoot a Cluster Agent Not Reporting Metrics
If the Cluster Agent does not report metrics for certain containers, pods, or nodes, it may be due to a problem with the Kubernetes Metrics Server. If metrics are not reported by the Metrics Server, then the Cluster Agent is unable to report them.
To verify that the Metrics Server is sending metrics, enter this command from your cluster's primary node:
$ kubectl get --raw /apis/metrics.k8s.io/v1beta1/pods
If the output of the command does not show metrics for the container, there may be a problem with the Metrics Server. This example shows output from the Metrics Server:
{
"kind":"PodMetricsList",
"apiVersion":"metrics.k8s.io/v1beta1",
"metadata":{
"selfLink":"/apis/metrics.k8s.io/v1beta1/pods"
},
"items":[
{
"metadata":{
"name":"replicaset-test-cjnsc",
"namespace":"test-qe",
"selfLink":"/apis/metrics.k8s.io/v1beta1/namespaces/test-qe/pods/replicaset-test-cjnsc",
"creationTimestamp":"2019-09-23T10:24:46Z"
},
"timestamp":"2019-09-23T10:23:38Z",
"window":"30s",
"containers":[
{
"name":"appagent",
"usage":{
"cpu":"1667384n",
"memory":"258672Ki"
}
}
]
}
]
}
As the Metric Server collects metrics from nodes, pods, and containers, it logs all issues. To retrieve and view logs for the Metric Server, enter:
$ kubectl logs <metric-server pod name> -n <namespace for metric-server(default value is: "kube-system")> --tail <number of required lines of logs>
For example:
$ kubectl logs metrics-server-6764b987d-mtn7g -n kube-system --tail 20
The Metric Server logs may reveal why it could not collect metrics. For example:
E0920 11:44:54.204075 1 reststorage.go:147] unable to fetch pod metrics for pod test-qe/replicaset-test-9k7rl: no metrics known for pod
E0920 11:44:54.204080 1 reststorage.go:147] unable to fetch pod metrics for pod test/replicaset1-458-g9n2d: no metrics known for pod
E0920 11:44:54.204089 1 reststorage.go:147] unable to fetch pod metrics for pod kube-system/kube-proxy-t54rc: no metrics known for pod
E0920 11:45:19.188033 1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:ip-111.111.111.111: unable to fetch metrics from Kubelet ip-111.111.111.111 (111.111.111.111): Get https://111.111.111.111:2222/stats/summary/: dial tcp 111.111.111.111:2222: i/o timeout
Cluster Agent Restarts
If the Cluster Agent restarts, you can verify that a restart occurred from the pod details. To retrieve the pod details, enter:
kubectl get pods -n appdynamics
Sample output:
NAME READY STATUS RESTARTS AGE
appdynamics-operator-6fff76b466-qtx57 1/1 Running 0 4h18m
k8s-cluster-agent-perf-jg-6fc498d557-q7zst 1/1 Running 1 83m
If the Cluster Agent unexpectedly restarts, the RESTARTS
count value will be greater than zero. You will have to explicitly reset both namespaces and the logs.
AppDynamics strongly recommends that you do not overwrite the default stdoutLogging: true
property value in the cluster-agent.yaml
file. If you set this property to false
, the kubectl logs
command does not return logs.
Cluster Agent logs persist even if the Cluster Agent is restarted by Kubernetes. To view the Cluster Agent logs for the Cluster Agent pod (that restarted), enter:
kubectl -n appdynamics logs --previous ${CLUSTER_AGENT_POD_NAME}
If the Cluster Agent pod has restarted, the monitored namespaces (that you configured through the UI) are not preserved. If you configured namespaces through the UI, you should add the same namespaces to your cluster-agent.yaml
file under nsToMonitor
, and then apply the configuration. As a result, the Cluster Agent pod will retain the monitored namespaces when it restarts.
If you did not add namespaces to the cluster-agent.yaml
file, you can reconfigure your monitored namespaces:
- Go to Appdynamics Agents > Cluster Agents > {CLUSTER_AGENT} > Configure.
- Add the namespaces to monitor.
APM Correlation on OpenShift 4.x
Container Runtime Interface using OCI (Open Container Initiative) compatible runtimes (CRI-O), is the default container runtime on Red Hat Openshift 4.x. If you use APM Agents with OpenShift 4.x, you must update the UNIQUE_HOST_ID
to support the syntax required for CRI-O containers. This setting applies to both new and existing application containers. If you are running App Agents, then you must modify the App Agent YAML
file.
To run App Agents with APM correlation on Openshift 4.x:
Open your App Agent
YAML
file.Locate the
spec: > args:
section within the file.Update the
UNIQUE_
HOST_ID
argument in thecontainers spec
using this example as a guide:spec: containers: - name: client-api command: ["/bin/sh"] args: ["-c", "UNIQUE_HOST_ID=$(sed -rn '1s#.*/##; 1s/(.{12}).*/\\1/p' /proc/self/cgroup) && java -Dappdynamics.agent.uniqueHostId=$UNIQUE_HOST_ID $JAVA_OPTS -jar /java-services.jar"] envFrom: - configMapRef: name: agent-config
CODEIf APM Correlation is working correctly, when you click the Pod Details link, the link opens the APM Node Dashboard for that node.
Cluster Agents or Pods are not Visible in the Controller
If Agents or pods are not visible in the Controller, or if Agents or pods are not registered and reporting, review the sim.cluster.agent.limit
and sim.cluster.pod.limit
descriptions in Controller Settings for the Cluster Agent.
Cluster Agent Pods are not Created When Security Policy is Enabled
If you have Pod Security Policies applied in the cluster, add the following to the cluster-agent-operator.yaml
file:
securityContext:
runAsUser: 1000
This YAML file example displays the security context:
apiVersion: apps/v1
kind: Deployment
metadata:
name: appdynamics-operator
namespace: appdynamics
.
.
spec:
.
.
.
template:
.
.
securityContext:
runAsUser: 1000
For Agent 20.4 and 20.5, add the runAsUser: 1000
field in the cluster-agent.yaml
file.