Amazon Managed Service for Apache Flink is a fully managed Amazon service that enables you to use an Apache Flink application to process streaming data. Cisco Cloud Observability supports monitoring Managed Apache Flink entities as well as legacy Kinesis Data Analytics (KDA) SQL applications.

You must configure cloud connections to monitor this entity. See Set up Cisco AppDynamics Cloud Collectors to Monitor AWS.

Cisco Cloud Observability displays AWS entities on the Observe page. Metrics are displayed for specific entity instances in the list and detail views.

This document contains references to third-party documentation. Splunk AppDynamics does not own any rights and assumes no responsibility for the accuracy or completeness of such third-party documentation.

Detail View

To display the detail view for an Amazon Apache Managed Flink Application:

  1. Navigate to the Observe page. 
  2. Under Analytics, click AWS Flink Applications.
    The list view now displays.
  3. From the list, click an instance Name to display the detail view.
    The detail view displays the metrics, key performance indicators, and properties (attributes) related to the instance you selected.

Metrics and Key Performance Indicators

Cisco Cloud Observability displays the following metrics and key performance indicators (KPIs) for an Amazon Apache Managed Flink Application. For more information, see Viewing Metrics and Dimensions in Managed Service for Apache Flink.

Display NameSource Metric NameDescription
Task (ms)

backPressuredTimeMsPerSecondThe time (in milliseconds) this task or operator is back pressured per second.
busyTimeMsPerSecondThe time (in milliseconds) this task or operator is busy (neither idle nor back pressured) per second. Can be NaN, if the value could not be calculated.
idleTimeMsPerSecondThe time (in milliseconds) this task or operator is idle (has no data to process) per second. Idle time excludes back pressured time, so if the task is back pressured it is not idle.
Task Manager CPU Utilization (%)cpuUtilizationOverall percentage of CPU utilization across task managers. For example, if there are five task managers, Managed Service for Apache Flink publishes five samples of this metric per reporting interval.
Task Manager Heap Utilization (%)heapMemoryUtilizationOverall heap memory utilization across task managers. For example, if there are five task managers, Managed Service for Apache Flink publishes five samples of this metric per reporting interval.
Task Manager Memory Used (Bytes)managedMemoryUsedThe amount of managed memory currently used.
Task Manager Memory Available (Bytes)managedMemoryTotalThe total amount of memory available.
Uptime (ms)uptimeThe time that the job has been running without interruption.
Downtime (ms)downtimeFor jobs currently in a failing/recovering situation, the time elapsed during this outage.
Container CPU Utilization (%)containerCPUUtilizationOverall percentage of CPU utilization across task manager containers in Flink application cluster. For example, if there are five task managers, correspondingly there are five TaskManager containers and Managed Service for Apache Flink publishes 2 * five samples of this metric per 1 minute reporting interval.
Container Memory Utilization (%)containerMemoryUtilizationOverall percentage of memory utilization across task manager containers in Flink application cluster. For example, if there are five task managers, correspondingly there are five TaskManager containers and Managed Service for Apache Flink publishes 2 * five samples of this metric per 1 minute reporting interval.
Container Managed Memory Utilization (%)managedMemoryUtilization

Percentage of managed memory utilization across task manager containers in the Flink application cluster. Derived by managedMemoryUsed/managedMemoryTotal.

Container Disk Utilization (%)containerDiskUtilizationOverall percentage of disk utilization across task manager containers in Flink application cluster. For example, if there are five task managers, correspondingly there are five TaskManager containers and Managed Service for Apache Flink publishes 2 * five samples of this metric per 1 minute reporting interval.
Current Stream Watermark (ms)currentInputWatermarkThe last watermark this application/operator/task/thread has received.
currentOutputWatermarkThe last watermark this application/operator/task/thread has emitted.
Restarts (Count)fullRestartsThe total number of times this job has fully restarted since it was submitted. This metric does not measure fine-grained restarts.
Last Checkpoint Size (Bytes)lastCheckpointSizeThe size of the last checkpoint in bytes. If the checkpoint size continuously increases, this may be indicative of a problem with the application.
Last Checkpoint Duration (ms)lastCheckpointDurationThe time it took to complete the last checkpoint.
Failed Checkpoints (Count)numberOfFailedCheckpointsThe number of times checkpointing has failed.
Records Processed CountnumRecordsInThe total number of records this application, operator, or task has received.
numRecordsOutThe total number of records this application, operator, or task has emitted.
numRecordsInPerSecondThe total number of records this application, operator, or task has received per second.
numRecordsOutPerSecondThe total number of records this application, operator, or task has emitted per second.
Records Processed Rate (Count/Sec)numRecordsInPerSecondThe total number of records this application, operator, or task has received per second.
numRecordsOutPerSecondThe total number of records this application, operator, or task has emitted per second.
Records Dropped (Count)numLateRecordsDroppedThe number of records this operator or task has dropped due to arriving late.
Old Generation GC CountoldGenerationGCCountThe total number of old garbage collection operations that have occurred across all task managers.
Thread CountthreadCountThe total number of live threads used by the application.
Zeppelin CPU Utilization (%)zeppelinCpuUtilizationOverall percentage of CPU utilization in the Apache Zeppelin server.
Zeppelin Heap Utilization (%)zeppelinHeapMemoryUtilizationOverall percentage of heap memory utilization for the Apache Zeppelin server.
Zeppelin Thread CountzeppelinThreadCountThe total number of live threads used by the Apache Zeppelin server.
Zeppelin Waiting Jobs (Count)zeppelinWaitingJobsThe number of queued Apache Zeppelin jobs waiting for a thread.
Zeppelin Uptime (Seconds)zeppelinServerUptimeThe total time that the server has been up and running.
Consumed KPUsKPUsThe number of consumed Kinesis Process Units of the the Apache Flink Application.
Input Source Lag (ms)MillisBehindLatestThe number of milliseconds the consumer is behind the head of the stream, indicating how far behind current time the consumer is.

Properties (Attributes)

Cisco Cloud Observability displays the following properties for an Amazon Apache Managed Flink Application.

Display NameProperty NameDescription
Nameaws.flink_application.nameThe name of the Apache Flink application.
Regioncloud.regionThe geographical region the resource is running.
Account IDcloud.account.idThe cloud account ID the resource is assigned to.
Created Ataws.flink_application.created_atThe timestamp the Apache Flink application was created.
Service Execution Roleaws.flink_application.service_execution_roleThe ARN of the service execution role of the Apache Flink application.
Statusaws.flink_application.statusThe last application status of the Apache Flink application.
Updated Ataws.flink_application.updated_atThe timestamp the Apache Flink application was last updated.
Restore Typeaws.flink_application.configuration.restore.typeThe restore type of the Apache Flink application.
Snapshotaws.flink_application.configuration.restore.snapshot_nameThe name of the snapshot to restore the Apache Flink application from.
Allow Non-Restored Stateaws.flink_application.configuration.restore.allow_non_restored_stateSpecifies if non-restored state is allowed for the Apache Flink application.
Checkpoint Typeaws.flink_application.configuration.checkpoint.typeThe checkpoint type of the Apache Flink application.
Checkpoint Enabledaws.flink_application.configuration.checkpoint.enabledSpecifies if the Apache Flink application checkpoint is enabled.
Monitoring Typeaws.flink_application.configuration.monitoring.typeThe monitoring type of the Apache Flink application.
Metric Level of Monitoringaws.flink_application.configuration.monitoring.level.metricsThe metrics level classification of the Apache Flink application.
Log Level of Monitoringaws.flink_application.configuration.monitoring.level.logThe log level of the Apache Flink application.
Parallelism Typeaws.flink_application.configuration.parallelism.typeThe parallelism type of the Apache Flink application.
Maximum Parallelismaws.flink_application.configuration.parallelism.maxThe max parallelism of the Apache Flink application.
Parallelism Per KPUaws.flink_application.configuration.parallelism.per_kpuThe parallelism per KPU of the Apache Flink application.
Autoscaling Enabledaws.flink_application.configuration.parallelism.autoscaling_enabledThe current autoscaling status of the Apache Flink application.
Snapshot Enabledaws.flink_application.configuration.snapshot_enabledSpecifies if snapshot is enabled for the Apache Flink application.
Cloudwatch Log Optionaws.flink_application.configuration.cloudwatch.log.optionThe log option of the Apache Flink application.
Configuration Modeaws.flink_application.configuration.modeThe processing mode of the Apache Flink application.
Zeppelin Log Levelaws.flink_application.configuration.zeppelin.log_levelThe log level of the Apache Zeppelin notebook.

Retention and Purge Time-To-Live (TTL)

For all cloud and infrastructure entities, the retention TTL is 180 minutes (3 hours) and the purge TTL is 525,600 minutes (365 days). 

Amazon Web Services, the AWS logo, AWS, and any other AWS Marks used in these materials are trademarks of Amazon.com, Inc. or its affiliates in the United States and/or other countries.