AWS Glue

AWS Glue is a serverless data integration service that makes it easy for analytics users to discover, prepare, move, and integrate data from multiple sources.

Cisco Cloud Observability supports monitoring AWS Glue Jobs, which encapsulate a script that connects to your source data, process it, and then write it out to your data target. You can monitor job runs to understand runtime metrics such as completion status, duration, and start time.

You must configure cloud connections to monitor this entity. See Set up Cisco AppDynamics Cloud Collectors to Monitor AWS.

Cisco Cloud Observability displays AWS entities on the Observe page. Metrics are displayed for specific entity instances in the list and detail views.

This document contains references to third-party documentation. Splunk AppDynamics does not own any rights and assumes no responsibility for the accuracy or completeness of such third-party documentation.

Detail View

To display the detail view for an AWS Glue instance:

Navigate to the Observe page.
Under Analytics, click AWS Glue Jobs.
The list view now displays.
From the list, click an instance Name to display the detail view.
The detail view displays the metrics, key performance indicators, and properties (attributes) related to the instance you selected.

Metrics and Key Performance Indicators

Cisco Cloud Observability displays the following metrics and key performance indicators (KPIs) for an AWS Glue. For more information, see Monitoring AWS Glue using Amazon CloudWatch metrics.

Display Name	Source Metric Name	Description
Read Bytes	`glue.driver.aggregate.bytesRead`	The number of bytes read from all data sources by all completed Spark tasks running in all executors.
Elapsed Time (ms)	`glue.driver.aggregate.elapsedTime`	The ETL elapsed time in milliseconds (does not include the job bootstrap times).
Completed Stages (Count)	`glue.driver.aggregate.numCompletedStages`	The number of completed stages in the job.
Task Count	`glue.driver.aggregate.numCompletedTasks`	The number of completed tasks in the job.
	`glue.driver.aggregate.numFailedTasks`	The number of failed tasks.
	`glue.driver.aggregate.numKilledTasks`	The number of tasks killed.
Record Count	`glue.driver.streaming.numRecords`	The number of records that are received in a micro-batch. This metric is only available for AWS Glue streaming jobs with AWS Glue version 2.0 and above.
Record Count	`glue.driver.aggregate.recordsRead`	The number of records read from all data sources by all completed Spark tasks running in all executors.
Shuffle Throughput (Bytes)	`glue.driver.aggregate.shuffleLocalBytesRead`	The number of bytes read by all executors to shuffle data between them since the previous report (aggregated by the AWS Glue Metrics Dashboard as the number of bytes read for this purpose during the previous minute).
Shuffle Throughput (Bytes)	`glue.driver.aggregate.shuffleBytesWritten`	The number of bytes written by all executors to shuffle data between them since the previous report (aggregated by the AWS Glue Metrics Dashboard as the number of bytes written for this purpose during the previous minute).
Disk Usage (MB)	`glue.driver.BlockManager.disk.diskSpaceUsed_MB`	The number of megabytes of disk space used across all executors.
Executors Count	`glue.driver.ExecutorAllocationManager.executors.numberAllExecutors`	The number of actively running job executors.
Executors Count	`glue.driver.ExecutorAllocationManager.executors.numberMaxNeededExecutors`	The number of maximum (actively running and pending) job executors needed to satisfy the current load.
Heap Usage Percentage	`glue.driver.jvm.heap.usage`	The fraction of memory used by the JVM heap for this driver (scale: 0-1) for driver, executor identified by executorId, or ALL executors.
Heap Bytes Used	`glue.driver.jvm.heap.used`	The number of memory bytes used by the JVM heap for the driver, the executor identified by executorId, or ALL executors.
S3 Throughput	`glue.driver.s3.filesystem.read_bytes`	The number of bytes read from Amazon S3 by the driver, an executor identified by executorId, or ALL executors since the previous report (aggregated by the AWS Glue Metrics Dashboard as the number of bytes read during the previous minute).
S3 Throughput	`glue.driver.s3.filesystem.write_bytes`	The number of bytes written to Amazon S3 by the driver, an executor identified by executorId, or ALL executors since the previous report (aggregated by the AWS Glue Metrics Dashboard as the number of bytes written during the previous minute).
Batch Process Time	`glue.driver.streaming.batchProcessingTimeInMs`	The time it takes to process the batches in milliseconds. This metric is only available for AWS Glue streaming jobs with AWS Glue version 2.0 and above.
CPU Usage Percentage	`glue.driver.system.cpuSystemLoad`	The fraction of CPU system load used (scale: 0-1) by the driver, an executor identified by executorId, or ALL executors.
Number Of Runs	`-`	The number of running, cancelled, successful, and failed runs in this glue job.

Properties (Attributes)

Cisco Cloud Observability displays the following properties for AWS Glue.

Display Name	Property Name	Description
Glue Job Name	`cloud.etl.name`	The name of the cloud ETL.
ARN	`aws.glue_job.source`	The ARN of the data source configured with the glue job.
Created At	`cloud.etl.created_at`	The data and time when the data service job was created
Last Modified At	`aws.glue_job.last_modified`	The time the glue job was last updated.
Data Target	`aws.glue_job.target`	The ARN of the data target configured with the glue job.
Data Transform	`aws.glue_job.transform`	The name of the data transform configured with the glue job.
Max Retries	`aws.glue_job.max_retries`	The maximum number of times to retry this job after a `JobRun` fails.
Max Capacity	`aws.glue_job.max_capacity`	The number of data processing units (DPUs) that can be allocated when this job runs.
Type of Worker	`aws.glue_job.worker.type`	The type of predefined worker that is allocated when a job runs.
Number of Workers	`aws.glue_job.worker.count`	The number of workers of a defined `workerType` that are allocated when a job runs.
Version of Apache Spark	`aws.glue_job.version`	The versions of Apache Spark and Python that available in a job.

Retention and Purge Time-To-Live (TTL)

For all cloud and infrastructure entities, the retention TTL is 180 minutes (3 hours) and the purge TTL is 525,600 minutes (365 days).

Amazon Web Services, the AWS logo, AWS, and any other AWS Marks used in these materials are trademarks of Amazon.com, Inc. or its affiliates in the United States and/or other countries.