Download PDF
Download page Amazon Managed Streaming for Apache Kafka.
Amazon Managed Streaming for Apache Kafka
Amazon Managed Streaming for Apache Kafka offers two types of clusters: provisioned and serverless. Cisco Cloud Observability supports collecting Amazon CloudWatch metrics from provisioned clusters. For serverless clusters, metrics are not collected and only the properties (attributes) are displayed.
Amazon Managed Streaming for Apache Kafka (MSK) is a fully managed service that enables you to build and run applications that use Apache Kafka to process streaming data.
Cisco Cloud Observability supports monitoring the following Amazon MSK entities:
- Cluster: A logical grouping of container instances that you can place tasks on. A cluster is the primary Amazon MSK resource that you can create in your account.
- Replicator: An Amazon MSK feature that enables you to reliably replicate data across Amazon MSK clusters in different or the same AWS region(s).
- Broker: Apache Kafka partitions topics and replicates these partitions across multiple nodes called brokers. Apache Kafka runs as a cluster on one or more brokers, and brokers can be located in multiple AWS availability zones to create a highly available cluster.
You must configure cloud connections to monitor this entity. See Set up Cisco AppDynamics Cloud Collectors to Monitor AWS.
Cisco Cloud Observability displays AWS entities on the Observe page. Metrics are displayed for specific entity instances in the list and detail views.
This document contains references to third-party documentation. Splunk AppDynamics does not own any rights and assumes no responsibility for the accuracy or completeness of such third-party documentation.
Detail View
To display the detail view for an Amazon MSK instance:
- Navigate to the Observe page.
- Under App Integrations, click AWS MSK Clusters.
The list view now displays. - From the list, click an instance Name to display the detail view.
The detail view displays the metrics, key performance indicators, and properties (attributes) related to the instance you selected.
- Navigate to the Observe page.
- Under App Integrations, click AWS MSK Clusters.
- From the Relationships panel on the left-hand side, click AWS MSK Replicators.
The list view now displays. - From the list, click an instance ID to display the detail view.
The detail view displays the metrics, key performance indicators, and properties (attributes) related to the instance you selected.
- Navigate to the Observe page.
- Under App Integrations, click AWS MSK Clusters.
- From the Relationships panel on the left-hand side, click AWS MSK Brokers.
The list view now displays. - From the list, click an instance ID to display the detail view.
The detail view displays the metrics, key performance indicators, and properties (attributes) related to the instance you selected.
Metrics and Key Performance Indicators
Cisco Cloud Observability displays the following metrics and key performance indicators (KPIs) for Amazon MSK. For more information, see:
Display Name | Source Metric Name | Description |
---|---|---|
Active Controller Count | ActiveControllerCount | The number of active controllers. Only one controller per cluster should be active at any given time. |
Global Partition Count | GlobalPartitionCount | The number of partitions across all topics in the cluster, excluding replicas. Because GlobalPartitionCount doesn't include replicas, the sum of the PartitionCount values can be higher than GlobalPartitionCount if the replication factor for a topic is greater than 1. |
Global Topic Count | GlobalTopicCount | Total number of topics across all brokers in the cluster. |
Disk Used Utilization (%) | KafkaAppLogsDiskUsed | The percentage of disk space used for application logs. |
Offline Partitions Count | OfflinePartitionsCount | Total number of partitions that are offline in the cluster. |
Display Name | Source Metric Name | Description |
---|---|---|
Replication Latency (ms) | ReplicationLatency | The time it takes records to replicate from the source to target cluster; duration between record produce time at source and replicated to target. If ReplicationLatency increases, check if clusters have enough partitions to support replication. High replication latency can occur when the partition count is too low for high throughput. |
Message Lag (count) | MessageLag | Monitors the sync between the MSK replicator and the source cluster. After an outage, MessageLag shows an increase indicating the number of messages the replicator is behind the source cluster and this can be monitored until the number of messages is 0, showing that the replicator has caught up with the source cluster. |
Replicator Failure (count) | ReplicatorFailure | The number of failures that the replicator is experiencing. |
Authentication Error (connections/s) | AuthError | The number of connections with failed authentication per second. If this metric is above 0, you can check if the service execution role policy for the replicator is valid and make sure there aren't deny permissions set for the cluster permissions. Based on the ClusterAlias dimension, you can identify if the source or target cluster is experiencing auth errors. |
Throttle Time (ms) | ThrottleTime | The average time in ms a request was throttled by brokers on the cluster. Set throttling to avoid having the MSK Replicator overwhelm the cluster. If this metric is 0, ReplicationLatency is not high, and ReplicatorThroughput is as expected, then throttling is working as expected. If this metric is above 0, you can adjust throttling accordingly. |
Cluster Ping Success Count | KafkaClusterPingSuccessCount | Indicates the health of the replicator connection to the Kafka cluster. If this value is 1, the connection is healthy. If the value is 0 or no datapoint, the connection is unhealthy. If the value is 0, you can check network or IAM permission settings for the Kafka cluster. Based on the ClusterAlias dimension, you can identify whether this metric is for source or target cluster. |
Display Name | Source Metric Name | Description |
---|---|---|
Burst Balance (burst credit) | BurstBalance | The remaining balance of input-output burst credits for EBS volumes in the cluster. Use it to investigate latency or decreased throughput.
|
Connection Count | ConnectionCount | The number of active authenticated, unauthenticated, and inter-broker connections. |
CPU Credit Balance (credit) | CPUCreditBalance | The number of earned CPU credits that a broker has accrued since it was launched. Credits are accrued in the credit balance after they are earned, and removed from the credit balance when they are spent. If you run out of the CPU credit balance, it can have a negative impact on your cluster's performance. You can take steps to reduce CPU load. For example, you can reduce the number of client requests or update the broker type to an M5 broker type. |
CPU Idle (%) | CpuIdle | The percentage of CPU idle time. |
CPU IO Wait (%) | CpuIoWait | The percentage of CPU idle time during a pending disk operation. |
CPU System (%) | CpuSystem | The percentage of CPU in kernel space. |
CPU User Space Utilization (%) | CpuUser | The percentage of CPU in user space. |
Disk Usage Percentage For Kafka Application Logs (%) |
| The percentage of disk space used for application logs. |
Disk Usage Percentage For Kafka Data Logs (%) | KafkaDataLogsDiskUsed | The percentage of disk space used for data logs. |
Leader Count | LeaderCount | The total number of leaders of partitions per broker, not including replicas. |
Memory Buffered (By) | MemoryBuffered | The size in bytes of buffered memory for the broker. |
Memory Cached (By) | MemoryCached | The size in bytes of cached memory for the broker. |
Memory Free (By) | MemoryFree | The size in bytes of memory that is free and available for the broker. |
Heap Memory After GC (%) | HeapMemoryAfterGC | The percentage of total heap memory in use after garbage collection. |
Memory Used (By) | MemoryUsed | The size in bytes of memory that is in use for the broker. |
Message Throughput (messages/s) | MessagesInPerSec | The number of incoming messages per second for the broker. |
Dropped Packets (count) | NetworkRxDropped | The number of dropped receive packages. |
Network Errors (count) | NetworkRxErrors | The number of network receive errors for the broker. |
Total Packets (count) | NetworkRxPackets | The number of packets received by the broker. |
Partitions (count) | PartitionCount | The total number of topic partitions per broker, including replicas. |
Produce Total Time (ms) | ProduceTotalTimeMsMean | The mean produce time in milliseconds. |
Request Size (By) | RequestBytesMean | The mean number of request bytes for the broker. |
Request Time (ms) | RequestTime | The average time in milliseconds spent in broker network and I/O threads to process requests. |
Root Disk Usage (%) | RootDiskUsed | The percentage of the root disk used by the broker. |
Swap Free (By) | SwapFree | The size in bytes of swap memory that is available for the broker. |
Swap Used (By) | SwapUsed | The size in bytes of swap memory that is in use for the broker. |
Traffic Shaping (count) | TrafficShaping | High-level metrics indicating the number of packets shaped (dropped or queued) due to exceeding network allocations. Finer detail is available with PER_BROKER metrics. |
In-Sync Replica Partitions (count) | UnderMinIsrPartitionCount | The number of under minIsr partitions for the broker. |
Replicated Partitions (count) | UnderReplicatedPartitions | The number of under-replicated partitions for the broker. |
ZooKeeper Request Latency (ms) | ZooKeeperRequestLatencyMsMean | The mean latency in milliseconds for Apache ZooKeeper requests from broker. |
Zoo Keeper Session State (current state) | ZooKeeperSessionState | Connection status of broker's ZooKeeper session, which may be one of the following:
|
Properties (Attributes)
Cisco Cloud Observability displays the following properties for Amazon MSK.
Display Name | Property Name | Description |
---|---|---|
Name | messaging.system.name | The name of the messaging system. |
ARN | messaging.system.id | The unique identifier of this messaging system. |
Cluster Type | aws.msk_cluster.type | The type of cluster. |
Type | messaging.system.type | The type of messaging system (Kafka, Active Message Broker, etc.) |
Region | cloud.region | The geographical region the resource is running. |
Availability Zone | cloud.availability_zone | Represents the zone where the resource is running. |
Account ID | cloud.account.id | The cloud account ID the resource is assigned to. |
State | cloud.messaging.system.state | The provisioning state of the messaging service. |
Distribution | aws.msk_cluster.broker.distribution | The distribution of broker nodes across Availability Zones. |
Subnets | aws.msk_cluster.broker.subnets | The list of subnets to connect to in the client virtual private cloud (VPC). |
Instance Type | aws.msk_cluster.broker.instance_type | The type of Amazon EC2 instances to use for Apache Kafka brokers. |
Security Group | aws.msk_cluster.broker.security_group | The AWS security groups to associate with the elastic network interfaces. |
Storage Throughput | aws.msk_cluster.broker.storage.throughput | The EBS volume provisioned throughput information. |
Storage Volume | aws.msk_cluster.broker.storage.volume | The size in GiB of the EBS volume for the data drive on each broker node. |
Storage Public Access | aws.msk_cluster.broker.public_access | The public access control for brokers. |
Storage Authentication Scram | aws.msk_cluster.broker.authentication.scram | The details for SASL/SCRAM client authentication for VPC connectivity. |
Storage Authentication IAM | aws.msk_cluster.broker.authentication.iam | The details for SASL/SCRAM client authentication for VPC connectivity. |
Storage Authentication TLS | aws.msk_cluster.broker.authentication.tls | TLS authentication is on or off for VPC connectivity. |
Authentication Scram | aws.msk_cluster.authentication.scram | The details for ClientAuthentication using SASL. |
Authentication IAM | aws.msk_cluster.authentication.iam | Indicates whether IAM access control is enabled. |
Authentication TLS | aws.msk_cluster.authentication.tls | Specifies whether you want to turn on or turn off TLS authentication. |
Authentication Enabled | aws.msk_cluster.authentication.unauthenticated | Specifies whether you want to turn on or turn off unauthenticated traffic to your cluster. |
Version | aws.msk_cluster.version | The current version of the MSK cluster. |
KMS ID | aws.msk_cluster.encryption.kms_id | The ARN of the AWS KMS key for encrypting data at rest. |
Encryption Type | aws.msk_cluster.encryption.type | Indicates the encryption setting for data in transit between clients and brokers. |
Encryption Enabled | aws.msk_cluster.encryption.enabled | Indicates that data communication among the broker nodes of the cluster is encrypted. |
Monitoring Type | aws.msk_cluster.monitoring.type | Specifies which metrics are gathered for the MSK cluster. |
JMX Exporter | aws.msk_cluster.monitoring.jmx_exporter | Indicates whether you want to turn on or turn off the JMX Exporter. |
Node Exporter | aws.msk_cluster.monitoring.node_exporter | Indicates whether you want to turn on or turn off the Node Exporter. |
Cloudwatch Logging | aws.msk_cluster.logging.cloudwatch | Indicates whether you want to turn on or turn off the Cloud Watch. |
Firehose Logging | aws.msk_cluster.logging.firehose | Indicates whether you want to turn on or turn off the Firehose. |
S3 Logging | aws.msk_cluster.logging.s3 | Indicates whether you want to turn on or turn off the S3. |
Nodes | aws.msk_cluster.nodes | The number of broker nodes in the cluster. |
Zookeeper Endpoint | aws.msk_cluster.zookeeper.endpoint | The connection string to use to connect to the Apache ZooKeeper cluster. |
Zookeeper Endpoint TLS | aws.msk_cluster.zookeeper.endpoint_tls | The connection string to use to connect to zookeeper cluster on Tls port. |
Display Name | Property Name | Description |
---|---|---|
ARN | aws.msk_replicator.id | The Amazon Resource Name (ARN) of the replicator. |
Name | aws.msk_replicator.name | The name of the replicator. |
Region | cloud.region | The geographical region the resource is running. |
Created At | aws.msk_replicator.created_at | The time the replicator was created. |
Account ID | cloud.account.id | The cloud account ID the resource is assigned to. |
Version | aws.msk_replicator.version | The current version of the replicator. |
Is Reference | aws.msk_replicator.is_reference | Specifies whether this resource is a replicator reference. |
Source Cluster ARN | aws.msk_replicator.source_cluster.id | The ARN of an Amazon MSK source cluster. |
Source Cluster Alias | aws.msk_replicator.source_cluster.alias | The alias of the source Kafka cluster. |
Target Cluster ARN | aws.msk_replicator.target_cluster.id | The ARN of an Amazon MSK target cluster. |
Target Cluster Alias | aws.msk_replicator.target_cluster.alias | The alias of the target Kafka cluster. |
Status | aws.msk_replicator.status | The state of the replicator. |
Display Name | Property Name | Description |
---|---|---|
ARN | aws.msk_broker.arn | The Amazon Resource Name (ARN) of the broker. |
ID | aws.msk_broker.id | The ID of the broker. |
Region | cloud.region | The geographical region the resource is running. |
Account ID | cloud.account.id | The cloud account ID the resource is assigned to. |
Started At | aws.msk_broker.started_at | The start time of the broker. |
Network Interface | aws.msk_broker.network.interface | The attached elastic network interface of the broker. |
Subnet | aws.msk_broker.network.subnet | The client subnet to which this broker node belongs. |
Network Address | aws.msk_broker.network.address | The virtual private cloud (VPC) of the client. |
Configuration ARN | aws.msk_broker.configuration.arn | The Amazon Resource Name (ARN) of the configuration used for the cluster. |
Configuration Revision | aws.msk_broker.configuration.revision | The revision of the configuration to use. This field isn’t visible in this preview release. |
Version | aws.msk_broker.version | The version of Apache Kafka. |
Endpoints | aws.msk_broker.endpoints | The endpoints for accessing the broker. |
Instance Type | aws.msk_broker.instance_type | The instance type. |
Type | aws.msk_broker.type | The node type. |
Retention and Purge Time-To-Live (TTL)
For all cloud and infrastructure entities, the retention TTL is 180 minutes (3 hours) and the purge TTL is 525,600 minutes (365 days).
Third party names, logos, marks, and general references used in these materials are the property of their respective owners or their affiliates in the United States and/or other countries. Inclusion of such references are for informational purposes only and are not intended to promote or otherwise suggest a relationship between Splunk AppDynamics and the third party.