AWS Glue is a serverless data integration service that makes it easy for analytics users to discover, prepare, move, and integrate data from multiple sources.

Cisco Cloud Observability supports monitoring AWS Glue Jobs, which encapsulate a script that connects to your source data, process it, and then write it out to your data target. You can monitor job runs to understand runtime metrics such as completion status, duration, and start time.

You must configure cloud connections to monitor this entity. See Set up Cisco AppDynamics Cloud Collectors to Monitor AWS.

Cisco Cloud Observability displays AWS entities on the Observe page. Metrics are displayed for specific entity instances in the list and detail views.

This document contains references to third-party documentation. Splunk AppDynamics does not own any rights and assumes no responsibility for the accuracy or completeness of such third-party documentation.

Detail View

To display the detail view for an AWS Glue instance:

  1. Navigate to the Observe page. 
  2. Under Analytics, click AWS Glue Jobs.
    The list view now displays.
  3. From the list, click an instance Name to display the detail view.
    The detail view displays the metrics, key performance indicators, and properties (attributes) related to the instance you selected.

Metrics and Key Performance Indicators

Cisco Cloud Observability displays the following metrics and key performance indicators (KPIs) for an AWS Glue. For more information, see Monitoring AWS Glue using Amazon CloudWatch metrics.

Display NameSource Metric NameDescription
Read Bytesglue.driver.aggregate.bytesReadThe number of bytes read from all data sources by all completed Spark tasks running in all executors.
Elapsed Time (ms)glue.driver.aggregate.elapsedTimeThe ETL elapsed time in milliseconds (does not include the job bootstrap times).
Completed Stages (Count)glue.driver.aggregate.numCompletedStagesThe number of completed stages in the job.
Task Countglue.driver.aggregate.numCompletedTasksThe number of completed tasks in the job.
glue.driver.aggregate.numFailedTasksThe number of failed tasks.
glue.driver.aggregate.numKilledTasksThe number of tasks killed.
Record Countglue.driver.streaming.numRecords

The number of records that are received in a micro-batch. This metric is only available for AWS Glue streaming jobs with AWS Glue version 2.0 and above.

glue.driver.aggregate.recordsReadThe number of records read from all data sources by all completed Spark tasks running in all executors.
Shuffle Throughput (Bytes)glue.driver.aggregate.shuffleLocalBytesReadThe number of bytes read by all executors to shuffle data between them since the previous report (aggregated by the AWS Glue Metrics Dashboard as the number of bytes read for this purpose during the previous minute).
glue.driver.aggregate.shuffleBytesWrittenThe number of bytes written by all executors to shuffle data between them since the previous report (aggregated by the AWS Glue Metrics Dashboard as the number of bytes written for this purpose during the previous minute).
Disk Usage (MB)glue.driver.BlockManager.disk.diskSpaceUsed_MBThe number of megabytes of disk space used across all executors.
Executors Countglue.driver.ExecutorAllocationManager.executors.numberAllExecutorsThe number of actively running job executors.
glue.driver.ExecutorAllocationManager.executors.numberMaxNeededExecutorsThe number of maximum (actively running and pending) job executors needed to satisfy the current load.
Heap Usage Percentageglue.driver.jvm.heap.usageThe fraction of memory used by the JVM heap for this driver (scale: 0-1) for driver, executor identified by executorId, or ALL executors.
Heap Bytes Usedglue.driver.jvm.heap.usedThe number of memory bytes used by the JVM heap for the driver, the executor identified by executorId, or ALL executors.
S3 Throughputglue.driver.s3.filesystem.read_bytesThe number of bytes read from Amazon S3 by the driver, an executor identified by executorId, or ALL executors since the previous report (aggregated by the AWS Glue Metrics Dashboard as the number of bytes read during the previous minute).
glue.driver.s3.filesystem.write_bytesThe number of bytes written to Amazon S3 by the driver, an executor identified by executorId, or ALL executors since the previous report (aggregated by the AWS Glue Metrics Dashboard as the number of bytes written during the previous minute).
Batch Process Timeglue.driver.streaming.batchProcessingTimeInMsThe time it takes to process the batches in milliseconds. This metric is only available for AWS Glue streaming jobs with AWS Glue version 2.0 and above.
CPU Usage Percentageglue.driver.system.cpuSystemLoadThe fraction of CPU system load used (scale: 0-1) by the driver, an executor identified by executorId, or ALL executors.
Number Of Runs-The number of running, cancelled, successful, and failed runs in this glue job.

Properties (Attributes)

Cisco Cloud Observability displays the following properties for AWS Glue.

Display NameProperty NameDescription
Glue Job Namecloud.etl.nameThe name of the cloud ETL.
ARNaws.glue_job.sourceThe ARN of the data source configured with the glue job.
Created Atcloud.etl.created_atThe data and time when the data service job was created
Last Modified Ataws.glue_job.last_modifiedThe time the glue job was last updated.
Data Targetaws.glue_job.targetThe ARN of the data target configured with the glue job.
Data Transformaws.glue_job.transformThe name of the data transform configured with the glue job.
Max Retriesaws.glue_job.max_retriesThe maximum number of times to retry this job after a JobRun fails.
Max Capacityaws.glue_job.max_capacityThe number of data processing units (DPUs) that can be allocated when this job runs.
Type of Workeraws.glue_job.worker.typeThe type of predefined worker that is allocated when a job runs.
Number of Workersaws.glue_job.worker.countThe number of workers of a defined workerType that are allocated when a job runs.
Version of Apache Sparkaws.glue_job.versionThe versions of Apache Spark and Python that available in a job.

Retention and Purge Time-To-Live (TTL)

For all cloud and infrastructure entities, the retention TTL is 180 minutes (3 hours) and the purge TTL is 525,600 minutes (365 days). 

Amazon Web Services, the AWS logo, AWS, and any other AWS Marks used in these materials are trademarks of Amazon.com, Inc. or its affiliates in the United States and/or other countries.