If the Cluster Agent is not reporting metrics for certain containers, pods, or nodes it may be due to a problem with the Kubernetes Metrics Server. If metrics are not reported by the Metrics Server, the Cluster Agent is unable to report them.

To verify that the Metrics Server is sending metrics, run this command from your cluster's master node:

$ kubectl get --raw /apis/metrics.k8s.io/v1beta1/pods
CODE

If the output of the command does not show metrics for the container in question, then the problem is most likely to be with the Metrics Server. Example output from the Metrics Server:

{ 
   "kind":"PodMetricsList",
   "apiVersion":"metrics.k8s.io/v1beta1",
   "metadata":{ 
      "selfLink":"/apis/metrics.k8s.io/v1beta1/pods"
   },
   "items":[ 
      { 
         "metadata":{ 
            "name":"replicaset-test-cjnsc",
            "namespace":"test-qe",
            "selfLink":"/apis/metrics.k8s.io/v1beta1/namespaces/test-qe/pods/replicaset-test-cjnsc",
            "creationTimestamp":"2019-09-23T10:24:46Z"
         },
         "timestamp":"2019-09-23T10:23:38Z",
         "window":"30s",
         "containers":[ 
            { 
               "name":"appagent",
               "usage":{ 
                  "cpu":"1667384n",
                  "memory":"258672Ki"
               }
            }
         ]
      }
   ]
}
TEXT


The Metrics Server will log any issues while collecting node/pod/container metrics. You can get the logs for Metric Server by running:


$ kubectl logs <metric-server pod name>  -n <namespace for metric-server(default value is: "kube-system")> --tail <number of required lines of logs>
CODE

For example:

$ kubectl logs metrics-server-6764b987d-mtn7g -n kube-system --tail 20 
CODE

The returned Metric Server logs may contain the reason for not being able to collect metrics. For example:

E0920 11:44:54.204075       1 reststorage.go:147] unable to fetch pod metrics for pod test-qe/replicaset-test-9k7rl: no metrics known for pod
E0920 11:44:54.204080       1 reststorage.go:147] unable to fetch pod metrics for pod test/replicaset1-458-g9n2d: no metrics known for pod
E0920 11:44:54.204089       1 reststorage.go:147] unable to fetch pod metrics for pod kube-system/kube-proxy-t54rc: no metrics known for pod
E0920 11:45:19.188033       1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:ip-111.111.111.111: unable to fetch metrics from Kubelet ip-111.111.111.111 (111.111.111.111): Get https://111.111.111.111:2222/stats/summary/: dial tcp 111.111.111.111:2222: i/o timeout
CODE

Cluster Agent Restarts

Over time, if the Cluster Agent restarts, you can verify that a restart happened by getting the pod details using the command:

kubectl get pods -n appdynamics
CODE

Sample output:

NAME                                         READY   STATUS    RESTARTS   AGE
appdynamics-operator-6fff76b466-qtx57        1/1     Running   0          4h18m
k8s-cluster-agent-perf-jg-6fc498d557-q7zst   1/1     Running   1          83m
CODE


If the cluster agent unexpectedly restarts, the RESTARTS count value will be > 0. You will have to explicitly set namespaces again, and the logs because they will be reset. We strongly recommend that you do not set the write-to-stdout property to false in the cluster-agent.yaml file. If this property is set to false, the kubectl logs command does return logs.

Cluster Agent logs persist even if the Cluster Agent is restarted by Kubernetes. To see the Cluster Agent logs for the cluster agent pod which restarted, use the following command:

kubectl -n appdynamics logs --previous ${CLUSTER_AGENT_POD_NAME}
CODE


If the Cluster Agent pod has restarted, the monitored namespaces configured through the User Interface (UI) will not be preserved.  If you are configuring namespaces from UI, it is recommended to add the same namespaces to your cluster-agent.yaml file under nsToMonitor and then apply the same. So when the Cluster Agent pods restart, it will retain the monitored namespaces.

If you have not added namespaces to the cluster-agent.yaml file, to reconfigure your monitored namespaces:

  1. Go to Appdynamics Agents > Cluster Agents > {CLUSTER_AGENT} > Configure
  2. Add the namespaces to monitor again.

For more information, see the Add or Remove Namespaces section on Administer.

APM Correlation on OpenShift 4.x

CRI-O is the default container runtime on Red Hat Openshift 4.x. If you're using APM agents with OpenShift 4.x, you'll have to update the UNIQUE_HOST_ID to support the syntax required for CRI-O containers. This setting applies to both new and existing application containers. If you have app agents running, you'll have to modify your app agent YAML file.

To run app agents with APM correlation on Openshift 4.x:

  1. Open your app agent YAML file.

  2. Locate the spec: > args: section

  3. Update the UNIQUE_HOST_ID argument in the containers spec using the following example as a guide:

    spec:
          containers:
          - name: client-api
            command: ["/bin/sh"]
            args: ["-c", "UNIQUE_HOST_ID=$(sed -rn '1s#.*/##; 1s/(.{12}).*/\\1/p' /proc/self/cgroup) && java -Dappdynamics.agent.uniqueHostId=$UNIQUE_HOST_ID $JAVA_OPTS -jar /java-services.jar"]
            envFrom:
            - configMapRef:
                name: agent-config
    CODE
  4. If APM Correlation is working correctly, when you click on the Pod Details link, the link opens the APM node dashboard for that node.

    Cluster Agents or Pods Are Not Seen in the Controller

    If agents or pods are not visible in the Controller, or if agents or pods are not registered and reporting, see See the sim.cluster.agent.limit and sim.cluster.pod.limit descriptions under Controller Settings for the Cluster Agent.

    Need additional help? Technical support is available through AppDynamics Support.