Troubleshoot the Cluster Agent

This page describes steps to troubleshoot the Cluster Agent install. See Install the Cluster Agent and Validate the Cluster Agent Installation.

Troubleshoot a Cluster Agent Not Reporting to the Controller

If after installing the Cluster Agent the Cluster Dashboard does not appear in the Controller, it could be the result of a connectivity issue with the Controller.

Verify that a Server Visibility license is available. The Cluster Agent requires an available Server Visibility license to register successfully. See Cluster Agent Requirements and Supported Environments. From the Controller UI, check that a license is available under Administration/License/Account Usage.
Review the Cluster Agent events. If the Cluster Agent or Cluster Agent Operator fails to start, then review the events in the appdynamics namespace:
```
kubectl -n appdynamics get events

# to sort by most recent events:
kubectl -n appdynamics get events --sort-by='.lastTimestamp'
```
BASH
You can review the Cluster Agent pod specification for additional events:
```
kubectl -n appdynamics get pod <cluster-agent-pod> -o yaml
```
BASH
Review the Cluster Agent logs for errors regarding Controller communication. Open a command-line prompt and enter:
```
kubectl -n appdynamics logs <cluster-agent-pod-name>
```
BASH
Verify the Cluster Agent configuration. The Cluster Agent checks for configuration changes once a minute. To verify that your configurations have been applied and that the Cluster Agent is using the new values, open a command-line prompt and enter:
```
kubectl -n appdynamics describe cm cluster-agent-mon cluster-agent-log cluster-agent-config
```
CODE
Verify that the latest Cluster Agent is installed. If you are upgrading the Cluster Agent from a previous version, then the previous Operator YAML or image may not be compatible. You must reinstall the Cluster Agent Operator and Cluster Agent using the procedure described in Upgrade the Cluster Agent.

Troubleshoot a Cluster Agent Not Reporting Metrics

If the Cluster Agent does not report metrics for certain containers, pods, or nodes, it may be due to a problem with the Kubernetes Metrics Server. If metrics are not reported by the Metrics Server, then the Cluster Agent is unable to report them.

To verify that the Metrics Server is sending metrics, enter this command from your cluster's primary node:

$ kubectl get --raw /apis/metrics.k8s.io/v1beta1/pods

CODE

If the output of the command does not show metrics for the container, there may be a problem with the Metrics Server. This example shows output from the Metrics Server:

{ 
   "kind":"PodMetricsList",
   "apiVersion":"metrics.k8s.io/v1beta1",
   "metadata":{ 
      "selfLink":"/apis/metrics.k8s.io/v1beta1/pods"
   },
   "items":[ 
      { 
         "metadata":{ 
            "name":"replicaset-test-cjnsc",
            "namespace":"test-qe",
            "selfLink":"/apis/metrics.k8s.io/v1beta1/namespaces/test-qe/pods/replicaset-test-cjnsc",
            "creationTimestamp":"2019-09-23T10:24:46Z"
         },
         "timestamp":"2019-09-23T10:23:38Z",
         "window":"30s",
         "containers":[ 
            { 
               "name":"appagent",
               "usage":{ 
                  "cpu":"1667384n",
                  "memory":"258672Ki"
               }
            }
         ]
      }
   ]
}

TEXT

As the Metric Server collects metrics from nodes, pods, and containers, it logs all issues. To retrieve and view logs for the Metric Server, enter:

$ kubectl logs <metric-server pod name>  -n <namespace for metric-server(default value is: "kube-system")> --tail <number of required lines of logs>

CODE

For example:

$ kubectl logs metrics-server-6764b987d-mtn7g -n kube-system --tail 20

CODE

The Metric Server logs may reveal why it could not collect metrics. For example:

E0920 11:44:54.204075       1 reststorage.go:147] unable to fetch pod metrics for pod test-qe/replicaset-test-9k7rl: no metrics known for pod
E0920 11:44:54.204080       1 reststorage.go:147] unable to fetch pod metrics for pod test/replicaset1-458-g9n2d: no metrics known for pod
E0920 11:44:54.204089       1 reststorage.go:147] unable to fetch pod metrics for pod kube-system/kube-proxy-t54rc: no metrics known for pod
E0920 11:45:19.188033       1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:ip-111.111.111.111: unable to fetch metrics from Kubelet ip-111.111.111.111 (111.111.111.111): Get https://111.111.111.111:2222/stats/summary/: dial tcp 111.111.111.111:2222: i/o timeout

CODE

Cluster Agent Restarts

If the Cluster Agent restarts, you can verify that a restart occurred from the pod details. To retrieve the pod details, enter:

kubectl get pods -n appdynamics

CODE

Sample output:

NAME                                         READY   STATUS    RESTARTS   AGE
appdynamics-operator-6fff76b466-qtx57        1/1     Running   0          4h18m
k8s-cluster-agent-perf-jg-6fc498d557-q7zst   1/1     Running   1          83m

CODE

If the Cluster Agent unexpectedly restarts, the RESTARTS count value will be greater than zero. You will have to explicitly reset both namespaces and the logs.

AppDynamics strongly recommends that you do not overwrite the default stdoutLogging: true property value in the cluster-agent.yaml file. If you set this property to false, the kubectl logs command does not return logs.

Cluster Agent logs persist even if the Cluster Agent is restarted by Kubernetes. To view the Cluster Agent logs for the Cluster Agent pod (that restarted), enter:

kubectl -n appdynamics logs --previous ${CLUSTER_AGENT_POD_NAME}

CODE

If the Cluster Agent pod has restarted, the monitored namespaces (that you configured through the UI) are not preserved. If you configured namespaces through the UI, you should add the same namespaces to your cluster-agent.yaml file under nsToMonitor, and then apply the configuration. As a result, the Cluster Agent pod will retain the monitored namespaces when it restarts.

If you did not add namespaces to the cluster-agent.yaml file, you can reconfigure your monitored namespaces:

Go to Appdynamics Agents > Cluster Agents > {CLUSTER_AGENT} > Configure.
Add the namespaces to monitor.

See Add or Remove Namespaces.

APM Correlation on OpenShift 4.x

Container Runtime Interface using OCI (Open Container Initiative) compatible runtimes (CRI-O), is the default container runtime on Red Hat Openshift 4.x. If you use APM Agents with OpenShift 4.x, you must update the UNIQUE_HOST_ID to support the syntax required for CRI-O containers. This setting applies to both new and existing application containers. If you are running App Agents, then you must modify the App Agent YAML file.

To run App Agents with APM correlation on Openshift 4.x:

Open your App Agent YAML file.
Locate the spec: > args: section within the file.

Update the UNIQUE_HOST_ID argument in the containers spec using this example as a guide:

spec:
      containers:
      - name: client-api
        command: ["/bin/sh"]
        args: ["-c", "UNIQUE_HOST_ID=$(sed -rn '1s#.*/##; 1s/(.{12}).*/\\1/p' /proc/self/cgroup) && java -Dappdynamics.agent.uniqueHostId=$UNIQUE_HOST_ID $JAVA_OPTS -jar /java-services.jar"]
        envFrom:
        - configMapRef:
            name: agent-config

CODE

If APM Correlation is working correctly, when you click the Pod Details link, the link opens the APM Node Dashboard for that node.

Cluster Agents or Pods are not Visible in the Controller

If Agents or pods are not visible in the Controller, or if Agents or pods are not registered and reporting, review the sim.cluster.agent.limit and sim.cluster.pod.limit descriptions in Controller Settings for the Cluster Agent.

Cluster Agent Pods are not Created When Security Policy is Enabled

If you have Pod Security Policies applied in the cluster, add the following to the cluster-agent-operator.yaml file:

securityContext: 
  runAsUser: 1000

CODE

This YAML file example displays the security context:

apiVersion: apps/v1 
kind: Deployment 
metadata: 
  name: appdynamics-operator 
  namespace: appdynamics
  .
  .
spec:
  .
  . 
  .
  template:
    .
    .
    securityContext: 
      runAsUser: 1000

CODE

For Agent 20.4 and 20.5, add the runAsUser: 1000 field in the cluster-agent.yaml file.