Amazon Managed Streaming for Apache Kafka offers two types of clusters: provisioned and serverless. Cisco Cloud Observability supports collecting Amazon CloudWatch metrics from provisioned clusters. For serverless clusters, metrics are not collected and only the properties (attributes) are displayed.

Amazon Managed Streaming for Apache Kafka (MSK) is a fully managed service that enables you to build and run applications that use Apache Kafka to process streaming data.

Cisco Cloud Observability supports monitoring the following Amazon MSK entities:

  • Cluster: A logical grouping of container instances that you can place tasks on. A cluster is the primary Amazon MSK resource that you can create in your account. 
  • Replicator: An Amazon MSK feature that enables you to reliably replicate data across Amazon MSK clusters in different or the same AWS region(s).
  • Broker: Apache Kafka partitions topics and replicates these partitions across multiple nodes called brokers. Apache Kafka runs as a cluster on one or more brokers, and brokers can be located in multiple AWS availability zones to create a highly available cluster.

You must configure cloud connections to monitor this entity. See Set up Cisco AppDynamics Cloud Collectors to Monitor AWS.

Cisco Cloud Observability displays AWS entities on the Observe page. Metrics are displayed for specific entity instances in the list and detail views.

This document contains references to third-party documentation. Cisco AppDynamics does not own any rights and assumes no responsibility for the accuracy or completeness of such third-party documentation.

Detail View

To display the detail view for an Amazon MSK instance:

  1. Navigate to the Observe page. 
  2. Under App Integrations, click AWS MSK Clusters.
    The list view now displays.
  3. From the list, click an instance Name to display the detail view.
    The detail view displays the metrics, key performance indicators, and properties (attributes) related to the instance you selected.
  1. Navigate to the Observe page. 
  2. Under App Integrations, click AWS MSK Clusters.
  3. From the Relationships panel on the left-hand side, click AWS MSK Replicators.
    The list view now displays.
  4. From the list, click an instance ID to display the detail view.
    The detail view displays the metrics, key performance indicators, and properties (attributes) related to the instance you selected.
  1. Navigate to the Observe page. 
  2. Under App Integrations, click AWS MSK Clusters.
  3. From the Relationships panel on the left-hand side, click AWS MSK Brokers.
    The list view now displays.
  4. From the list, click an instance ID to display the detail view.
    The detail view displays the metrics, key performance indicators, and properties (attributes) related to the instance you selected.

Metrics and Key Performance Indicators

Cisco Cloud Observability displays the following metrics and key performance indicators (KPIs) for Amazon MSK. For more information, see:

Display NameSource Metric NameDescription
Active Controller CountActiveControllerCountThe number of active controllers. Only one controller per cluster should be active at any given time.
Global Partition CountGlobalPartitionCountThe number of partitions across all topics in the cluster, excluding replicas. Because GlobalPartitionCount doesn't include replicas, the sum of the PartitionCount values can be higher than GlobalPartitionCount if the replication factor for a topic is greater than 1.
Global Topic CountGlobalTopicCountTotal number of topics across all brokers in the cluster.
Disk Used Utilization (%)KafkaAppLogsDiskUsedThe percentage of disk space used for application logs.
Offline Partitions CountOfflinePartitionsCountTotal number of partitions that are offline in the cluster.
Display NameSource Metric NameDescription
Replication Latency (ms)ReplicationLatencyThe time it takes records to replicate from the source to target cluster; duration between record produce time at source and replicated to target. If ReplicationLatency increases, check if clusters have enough partitions to support replication. High replication latency can occur when the partition count is too low for high throughput.
Message Lag (count)MessageLagMonitors the sync between the MSK replicator and the source cluster. After an outage, MessageLag shows an increase indicating the number of messages the replicator is behind the source cluster and this can be monitored until the number of messages is 0, showing that the replicator has caught up with the source cluster.
Replicator Failure (count)ReplicatorFailureThe number of failures that the replicator is experiencing.
Authentication Error (connections/s)AuthErrorThe number of connections with failed authentication per second. If this metric is above 0, you can check if the service execution role policy for the replicator is valid and make sure there aren't deny permissions set for the cluster permissions. Based on the ClusterAlias dimension, you can identify if the source or target cluster is experiencing auth errors.
Throttle Time (ms)ThrottleTimeThe average time in ms a request was throttled by brokers on the cluster. Set throttling to avoid having the MSK Replicator overwhelm the cluster. If this metric is 0, ReplicationLatency is not high, and ReplicatorThroughput is as expected, then throttling is working as expected. If this metric is above 0, you can adjust throttling accordingly.
Cluster Ping Success CountKafkaClusterPingSuccessCountIndicates the health of the replicator connection to the Kafka cluster. If this value is 1, the connection is healthy. If the value is 0 or no datapoint, the connection is unhealthy. If the value is 0, you can check network or IAM permission settings for the Kafka cluster. Based on the ClusterAlias dimension, you can identify whether this metric is for source or target cluster.
Display NameSource Metric NameDescription
Burst Balance (burst credit)BurstBalance

The remaining balance of input-output burst credits for EBS volumes in the cluster. Use it to investigate latency or decreased throughput.

BurstBalance is not reported for EBS volumes when the baseline performance of a volume is higher than the maximum burst performance. 

Connection CountConnectionCountThe number of active authenticated, unauthenticated, and inter-broker connections.
CPU Credit Balance (credit)CPUCreditBalanceThe number of earned CPU credits that a broker has accrued since it was launched. Credits are accrued in the credit balance after they are earned, and removed from the credit balance when they are spent. If you run out of the CPU credit balance, it can have a negative impact on your cluster's performance. You can take steps to reduce CPU load. For example, you can reduce the number of client requests or update the broker type to an M5 broker type.
CPU Idle (%)CpuIdleThe percentage of CPU idle time.
CPU IO Wait (%)CpuIoWaitThe percentage of CPU idle time during a pending disk operation.
CPU System (%)CpuSystemThe percentage of CPU in kernel space.
CPU User Space Utilization (%)CpuUserThe percentage of CPU in user space.
Disk Usage Percentage For Kafka Application Logs (%)

KafkaAppLogsDiskUsed

The percentage of disk space used for application logs.
Disk Usage Percentage For Kafka Data Logs (%)KafkaDataLogsDiskUsedThe percentage of disk space used for data logs.
Leader CountLeaderCountThe total number of leaders of partitions per broker, not including replicas.
Memory Buffered (By)MemoryBufferedThe size in bytes of buffered memory for the broker.
Memory Cached (By)MemoryCachedThe size in bytes of cached memory for the broker.
Memory Free (By)MemoryFreeThe size in bytes of memory that is free and available for the broker.
Heap Memory After GC (%)HeapMemoryAfterGCThe percentage of total heap memory in use after garbage collection.
Memory Used (By)MemoryUsedThe size in bytes of memory that is in use for the broker.
Message Throughput (messages/s)MessagesInPerSecThe number of incoming messages per second for the broker.
Dropped Packets (count)NetworkRxDroppedThe number of dropped receive packages.
Network Errors (count)NetworkRxErrorsThe number of network receive errors for the broker.
Total Packets (count)NetworkRxPacketsThe number of packets received by the broker.
Partitions (count)PartitionCountThe total number of topic partitions per broker, including replicas.
Produce Total Time (ms)ProduceTotalTimeMsMeanThe mean produce time in milliseconds.
Request Size (By)RequestBytesMeanThe mean number of request bytes for the broker.
Request Time (ms)RequestTimeThe average time in milliseconds spent in broker network and I/O threads to process requests.
Root Disk Usage (%)RootDiskUsedThe percentage of the root disk used by the broker.
Swap Free (By)SwapFreeThe size in bytes of swap memory that is available for the broker.
Swap Used (By)SwapUsedThe size in bytes of swap memory that is in use for the broker.
Traffic Shaping (count)TrafficShapingHigh-level metrics indicating the number of packets shaped (dropped or queued) due to exceeding network allocations. Finer detail is available with PER_BROKER metrics.
In-Sync Replica Partitions (count)UnderMinIsrPartitionCountThe number of under minIsr partitions for the broker.
Replicated Partitions (count)UnderReplicatedPartitionsThe number of under-replicated partitions for the broker.
ZooKeeper Request Latency (ms)ZooKeeperRequestLatencyMsMeanThe mean latency in milliseconds for Apache ZooKeeper requests from broker.
Zoo Keeper Session State (current state)ZooKeeperSessionState

Connection status of broker's ZooKeeper session, which may be one of the following:

  • NOT_CONNECTED: '0.0'
  • ASSOCIATING: '0.1'
  • CONNECTING: '0.5'
  • CONNECTEDREADONLY: '0.8'
  • CONNECTED: '1.0'
  • CLOSED: '5.0'
  • AUTH_FAILED: '10.0'

Properties (Attributes)

Cisco Cloud Observability displays the following properties for Amazon MSK.

Display NameProperty NameDescription
Namemessaging.system.nameThe name of the messaging system.
ARNmessaging.system.idThe unique identifier of this messaging system.
Cluster Typeaws.msk_cluster.typeThe type of cluster.
Typemessaging.system.typeThe type of messaging system (Kafka, Active Message Broker, etc.)
Regioncloud.regionThe geographical region the resource is running.
Availability Zonecloud.availability_zoneRepresents the zone where the resource is running.
Account IDcloud.account.idThe cloud account ID the resource is assigned to.
Statecloud.messaging.system.stateThe provisioning state of the messaging service.
Distributionaws.msk_cluster.broker.distributionThe distribution of broker nodes across Availability Zones.
Subnetsaws.msk_cluster.broker.subnetsThe list of subnets to connect to in the client virtual private cloud (VPC).
Instance Typeaws.msk_cluster.broker.instance_typeThe type of Amazon EC2 instances to use for Apache Kafka brokers.
Security Groupaws.msk_cluster.broker.security_groupThe AWS security groups to associate with the elastic network interfaces.
Storage Throughputaws.msk_cluster.broker.storage.throughputThe EBS volume provisioned throughput information.
Storage Volumeaws.msk_cluster.broker.storage.volumeThe size in GiB of the EBS volume for the data drive on each broker node.
Storage Public Accessaws.msk_cluster.broker.public_accessThe public access control for brokers.
Storage Authentication Scramaws.msk_cluster.broker.authentication.scramThe details for SASL/SCRAM client authentication for VPC connectivity.
Storage Authentication IAMaws.msk_cluster.broker.authentication.iamThe details for SASL/SCRAM client authentication for VPC connectivity.
Storage Authentication TLSaws.msk_cluster.broker.authentication.tlsTLS authentication is on or off for VPC connectivity.
Authentication Scramaws.msk_cluster.authentication.scramThe details for ClientAuthentication using SASL.
Authentication IAMaws.msk_cluster.authentication.iamIndicates whether IAM access control is enabled.
Authentication TLSaws.msk_cluster.authentication.tlsSpecifies whether you want to turn on or turn off TLS authentication.
Authentication Enabledaws.msk_cluster.authentication.unauthenticatedSpecifies whether you want to turn on or turn off unauthenticated traffic to your cluster.
Versionaws.msk_cluster.versionThe current version of the MSK cluster.
KMS IDaws.msk_cluster.encryption.kms_idThe ARN of the AWS KMS key for encrypting data at rest.
Encryption Typeaws.msk_cluster.encryption.typeIndicates the encryption setting for data in transit between clients and brokers.
Encryption Enabledaws.msk_cluster.encryption.enabledIndicates that data communication among the broker nodes of the cluster is encrypted.
Monitoring Typeaws.msk_cluster.monitoring.typeSpecifies which metrics are gathered for the MSK cluster.
JMX Exporteraws.msk_cluster.monitoring.jmx_exporterIndicates whether you want to turn on or turn off the JMX Exporter.
Node Exporteraws.msk_cluster.monitoring.node_exporterIndicates whether you want to turn on or turn off the Node Exporter.
Cloudwatch Loggingaws.msk_cluster.logging.cloudwatchIndicates whether you want to turn on or turn off the Cloud Watch.
Firehose Loggingaws.msk_cluster.logging.firehoseIndicates whether you want to turn on or turn off the Firehose.
S3 Loggingaws.msk_cluster.logging.s3Indicates whether you want to turn on or turn off the S3.
Nodesaws.msk_cluster.nodesThe number of broker nodes in the cluster.
Zookeeper Endpointaws.msk_cluster.zookeeper.endpointThe connection string to use to connect to the Apache ZooKeeper cluster.
Zookeeper Endpoint TLSaws.msk_cluster.zookeeper.endpoint_tlsThe connection string to use to connect to zookeeper cluster on Tls port.
Display NameProperty NameDescription
ARNaws.msk_replicator.idThe Amazon Resource Name (ARN) of the replicator.

Name

aws.msk_replicator.nameThe name of the replicator.
Regioncloud.regionThe geographical region the resource is running.
Created Ataws.msk_replicator.created_atThe time the replicator was created.

Account ID

cloud.account.idThe cloud account ID the resource is assigned to.

Version

aws.msk_replicator.versionThe current version of the replicator.
Is Referenceaws.msk_replicator.is_referenceSpecifies whether this resource is a replicator reference.
Source Cluster ARNaws.msk_replicator.source_cluster.idThe ARN of an Amazon MSK source cluster.
Source Cluster Aliasaws.msk_replicator.source_cluster.aliasThe alias of the source Kafka cluster.
Target Cluster ARNaws.msk_replicator.target_cluster.idThe ARN of an Amazon MSK target cluster.
Target Cluster Aliasaws.msk_replicator.target_cluster.aliasThe alias of the target Kafka cluster.
Statusaws.msk_replicator.statusThe state of the replicator.
Display NameProperty NameDescription
ARNaws.msk_broker.arnThe Amazon Resource Name (ARN) of the broker.
IDaws.msk_broker.idThe ID of the broker.
Regioncloud.regionThe geographical region the resource is running.
Account IDcloud.account.idThe cloud account ID the resource is assigned to.
Started Ataws.msk_broker.started_atThe start time of the broker.
Network Interfaceaws.msk_broker.network.interfaceThe attached elastic network interface of the broker.
Subnetaws.msk_broker.network.subnetThe client subnet to which this broker node belongs.
Network Addressaws.msk_broker.network.addressThe virtual private cloud (VPC) of the client.
Configuration ARNaws.msk_broker.configuration.arnThe Amazon Resource Name (ARN) of the configuration used for the cluster.
Configuration Revisionaws.msk_broker.configuration.revisionThe revision of the configuration to use. This field isn’t visible in this preview release.
Versionaws.msk_broker.versionThe version of Apache Kafka.
Endpointsaws.msk_broker.endpointsThe endpoints for accessing the broker.
Instance Typeaws.msk_broker.instance_typeThe instance type.
Typeaws.msk_broker.typeThe node type.


Retention and Purge Time-To-Live (TTL)

For all cloud and infrastructure entities, the retention TTL is 180 minutes (3 hours) and the purge TTL is 525,600 minutes (365 days). 

Third party names, logos, marks, and general references used in these materials are the property of their respective owners or their affiliates in the United States and/or other countries. Inclusion of such references are for informational purposes only and are not intended to promote or otherwise suggest a relationship between Cisco AppDynamics and the third party.