Amazon EMR is a web service that makes it easy to process vast amounts of data efficiently using Apache Hadoop and services offered by Amazon Web Services. Cisco Cloud Observability monitors EMR clusters that are running Hadoop 1 and 2.X versions, but only collects metrics that are common between both versions.

You must configure cloud connections to monitor this entity. See Set up Cisco AppDynamics Cloud Collectors to Monitor AWS.

Cisco Cloud Observability displays AWS entities on the Observe page. Metrics are displayed for specific entity instances in the list and detail views.

This document contains references to third-party documentation. Splunk AppDynamics does not own any rights and assumes no responsibility for the accuracy or completeness of such third-party documentation.

Detail View

To display the detail view for Amazon EMR:

  1. Navigate to the Observe page. 
  2. Under Serverless Functions, click AWS EMR Clusters.
    The list view now displays.
  3. From the list, click an instance Name to display the detail view.
    The detail view displays the metrics, key performance indicators, and properties (attributes) related to the instance you selected.

Metrics and Key Performance Indicators 

Cisco Cloud Observability displays the following metrics and key performance indicators (KPIs) for Amazon EMR. See Monitoring Amazon EMR metrics with CloudWatch

Display NameSource MetricDescription
Is Idle (0/1)IsIdleIndicates that a cluster is no longer performing work, but is still alive and accruing charges. It is set to 1 if no tasks are running and no jobs are running, and set to 0 otherwise. This value is checked at five-minute intervals and a value of 1 indicates only that the cluster was idle when checked, not that it was idle for the entire five minutes. To avoid false positives, you should raise an alarm when this value has been 1 for more than one consecutive 5-minute check. For example, you might raise an alarm on this value if it has been 1 for thirty minutes or longer.

Percentage of Live Data Nodes

LiveDataNodesThe percentage of data nodes that are receiving work from Hadoop.
S3 Read/Write Bytes
S3BytesWrittenThe number of bytes written to Amazon S3. This metric aggregates MapReduce jobs only, and does not apply for other workloads on Amazon EMR.
S3BytesReadThe number of bytes read from Amazon S3. This metric aggregates MapReduce jobs only, and does not apply for other workloads on Amazon EMR.
HDFS Utilization (%)HDFSUtilizationThe percentage of HDFS storage currently used.

HDFS Read/Write Bytes

HDFSBytesReadThe number of bytes read from HDFS. This metric aggregates MapReduce jobs only, and does not apply for other workloads on Amazon EMR.
HDFSBytesWrittenThe number of bytes written to HDFS. This metric aggregates MapReduce jobs only, and does not apply for other workloads on Amazon EMR.
Concurrent Data Transfers (Bytes)TotalLoadThe total number of concurrent data transfers.
Number of BlocksMissingBlocksThe number of blocks in which HDFS has no replicas. These might be corrupt blocks.
Number of Running NodesCoreNodesRunningThe current number of CORE nodes running in a cluster.

Properties (Attributes)

Cisco Cloud Observability displays the following properties for Amazon EMR.

Display NameProperty NameDescription
Idaws.emr_cluster.idThe unique identifier for the cluster.
Nameaws.emr_cluster.nameThe name of the cluster.

State

aws.emr_cluster.stateThe current state of the cluster.
State Change Reasonaws.emr_cluster.state_change_reasonThe descriptive message for the state change reason.
Arnaws.emr_cluster.arnThe Amazon Resource Name (ARN) of the cluster.
Outpost Arnaws.emr_cluster.outpost.arnThe Amazon Resource Name (ARN) of the Outpost where the cluster is launched.
Step Currency Levelaws.emr_cluster.step_currency_levelSpecifies the number of steps that can be executed concurrently.

Creation Time

aws.emr_cluster.creation_timeThe creation date and time of the cluster.
Ready Timeaws.emr_cluster.ready_timeThe date and time when the cluster was ready to run steps.
End Timeaws.emr_cluster.end_timeThe date and time when the cluster was terminated.
Service Roleaws.emr_cluster.service_roleThe IAM role that will be assumed by the Amazon EMR service to access AWS resources on your behalf.
Autoscaling Roleaws.emr_cluster.autoscaling_roleAn IAM role for automatic scaling policies. The default role is EMR_AutoScaling_DefaultRole. The IAM role provides permissions that the automatic scaling feature requires to launch and terminate Amazon EC2 instances in an instance group.
Instance Collection Typeaws.emr_cluster.instance_collection_typeThe instance fleet configuration is available only in Amazon EMR releases 4.8.0 and later, excluding 5.0.x versions.
Log URIaws.emr_cluster.log_uriThe path to the Amazon S3 location where logs for the cluster are stored.
Log Encryption KMS Key IDaws.emr_cluster.log_encryption_kms_key_idThe AWS Key Management Service (AWS KMS) key ID used to encrypt the logs.
Scale Down Behavioraws.emr_cluster.scale_down_behavior

The way that individual Amazon EC2 instances terminate when an automatic scale-in activity occurs or an instance group is resized.

OS Release Labelaws.emr_cluster.os_release_labelThe Amazon Linux release specified in a cluster launch RunJobFlow request. If no Amazon Linux release was specified, the default Amazon Linux release is shown in the response.
Requested AMI Versionaws.emr_cluster.requested_ami_versionThe AMI version requested for the cluster.
Running AMI Versionaws.emr_cluster.running_ami_versionThe AMI version running on the cluster.

Termination Protected

aws.emr_cluster.termination_protectedIndicates whether Amazon EMR will lock the cluster to prevent the Amazon EC2 instances from being terminated by an API call or user intervention, or in the event of a cluster error.
Unhealthy Node Replacementaws.emr_cluster.unhealthy_node_replacementIndicates whether Amazon EMR should gracefully replace Amazon EC2 core instances that have degraded within the cluster.
Release Labelaws.emr_cluster.release_labelThe Amazon EMR release label, which determines the version of open-source application packages installed on the cluster.
Normalized Instance Hoursaws.emr_cluster.normalized_instance_hoursAn approximation of the cost of the cluster, represented in m1.small/hours.
Application Namesaws.emr_cluster.application.namesThe names of the applications.

Application Versions

aws.emr_cluster.application.versionsThe versions of the applications.
Visible To All Usersaws.emr_cluster.visible_to_all_usersIndicates whether the cluster is visible to all IAM users of the AWS account associated with the cluster.
Auto Terminateaws.emr_cluster.auto_terminateSpecifies whether the cluster should terminate after completing all steps.

Master Public DNS

aws.emr_cluster.master_public_dnsThe DNS name of the master node.
Custom AMI IDaws.emr_cluster.custom_ami_idAvailable only in Amazon EMR releases 5.7.0 and later. The ID of a custom Amazon EBS-backed Linux AMI if the cluster uses a custom AMI.
EBS Root Volume Sizeaws.emr_cluster.ebs_root_volume_sizeThe size, in GiB, of the EBS root device volume of the Linux AMI that is used for each EC2 instance.

EBS Root Volume IOPS

aws.emr_cluster.ebs_root_volume_iopsThe IOPS, of the Amazon EBS root device volume of the Linux AMI that is used for each Amazon EC2 instance. Available in Amazon EMR releases 6.15.0 and later.

EBS Root Volume Throughput

aws.emr_cluster.ebs_root_volume_throughputThe throughput, in MB/s, of the Amazon EBS root device volume of the Linux AMI that is used for each Amazon EC2 instance. Available in Amazon EMR releases 6.15.0 and later.

Retention and Purge Time-To-Live (TTL)

For all cloud and infrastructure entities, the retention TTL is 180 minutes (3 hours) and the purge TTL is 525,600 minutes (365 days). 

Amazon Web Services, the AWS logo, AWS, and any other AWS Marks used in these materials are trademarks of Amazon.com, Inc. or its affiliates in the United States and/or other countries.