Download PDF
Download page Amazon Apache Managed Flink Application.
Amazon Apache Managed Flink Application
Amazon Managed Service for Apache Flink is a fully managed Amazon service that enables you to use an Apache Flink application to process streaming data. Cisco Cloud Observability supports monitoring Managed Apache Flink entities as well as legacy Kinesis Data Analytics (KDA) SQL applications.
You must configure cloud connections to monitor this entity. See Set up Cisco AppDynamics Cloud Collectors to Monitor AWS.
Cisco Cloud Observability displays AWS entities on the Observe page. Metrics are displayed for specific entity instances in the list and detail views.
This document contains references to third-party documentation. Splunk AppDynamics does not own any rights and assumes no responsibility for the accuracy or completeness of such third-party documentation.
Detail View
To display the detail view for an Amazon Apache Managed Flink Application:
- Navigate to the Observe page.
- Under Analytics, click AWS Flink Applications.
The list view now displays. - From the list, click an instance Name to display the detail view.
The detail view displays the metrics, key performance indicators, and properties (attributes) related to the instance you selected.
Metrics and Key Performance Indicators
Cisco Cloud Observability displays the following metrics and key performance indicators (KPIs) for an Amazon Apache Managed Flink Application. For more information, see Viewing Metrics and Dimensions in Managed Service for Apache Flink.
Display Name | Source Metric Name | Description |
---|---|---|
Task (ms) | backPressuredTimeMsPerSecond | The time (in milliseconds) this task or operator is back pressured per second. |
busyTimeMsPerSecond | The time (in milliseconds) this task or operator is busy (neither idle nor back pressured) per second. Can be NaN, if the value could not be calculated. | |
idleTimeMsPerSecond | The time (in milliseconds) this task or operator is idle (has no data to process) per second. Idle time excludes back pressured time, so if the task is back pressured it is not idle. | |
Task Manager CPU Utilization (%) | cpuUtilization | Overall percentage of CPU utilization across task managers. For example, if there are five task managers, Managed Service for Apache Flink publishes five samples of this metric per reporting interval. |
Task Manager Heap Utilization (%) | heapMemoryUtilization | Overall heap memory utilization across task managers. For example, if there are five task managers, Managed Service for Apache Flink publishes five samples of this metric per reporting interval. |
Task Manager Memory Used (Bytes) | managedMemoryUsed | The amount of managed memory currently used. |
Task Manager Memory Available (Bytes) | managedMemoryTotal | The total amount of memory available. |
Uptime (ms) | uptime | The time that the job has been running without interruption. |
Downtime (ms) | downtime | For jobs currently in a failing/recovering situation, the time elapsed during this outage. |
Container CPU Utilization (%) | containerCPUUtilization | Overall percentage of CPU utilization across task manager containers in Flink application cluster. For example, if there are five task managers, correspondingly there are five TaskManager containers and Managed Service for Apache Flink publishes 2 * five samples of this metric per 1 minute reporting interval. |
Container Memory Utilization (%) | containerMemoryUtilization | Overall percentage of memory utilization across task manager containers in Flink application cluster. For example, if there are five task managers, correspondingly there are five TaskManager containers and Managed Service for Apache Flink publishes 2 * five samples of this metric per 1 minute reporting interval. |
Container Managed Memory Utilization (%) | managedMemoryUtilization | Percentage of managed memory utilization across task manager containers in the Flink application cluster. Derived by managedMemoryUsed/managedMemoryTotal. |
Container Disk Utilization (%) | containerDiskUtilization | Overall percentage of disk utilization across task manager containers in Flink application cluster. For example, if there are five task managers, correspondingly there are five TaskManager containers and Managed Service for Apache Flink publishes 2 * five samples of this metric per 1 minute reporting interval. |
Current Stream Watermark (ms) | currentInputWatermark | The last watermark this application/operator/task/thread has received. |
currentOutputWatermark | The last watermark this application/operator/task/thread has emitted. | |
Restarts (Count) | fullRestarts | The total number of times this job has fully restarted since it was submitted. This metric does not measure fine-grained restarts. |
Last Checkpoint Size (Bytes) | lastCheckpointSize | The size of the last checkpoint in bytes. If the checkpoint size continuously increases, this may be indicative of a problem with the application. |
Last Checkpoint Duration (ms) | lastCheckpointDuration | The time it took to complete the last checkpoint. |
Failed Checkpoints (Count) | numberOfFailedCheckpoints | The number of times checkpointing has failed. |
Records Processed Count | numRecordsIn | The total number of records this application, operator, or task has received. |
numRecordsOut | The total number of records this application, operator, or task has emitted. | |
numRecordsInPerSecond | The total number of records this application, operator, or task has received per second. | |
numRecordsOutPerSecond | The total number of records this application, operator, or task has emitted per second. | |
Records Processed Rate (Count/Sec) | numRecordsInPerSecond | The total number of records this application, operator, or task has received per second. |
numRecordsOutPerSecond | The total number of records this application, operator, or task has emitted per second. | |
Records Dropped (Count) | numLateRecordsDropped | The number of records this operator or task has dropped due to arriving late. |
Old Generation GC Count | oldGenerationGCCount | The total number of old garbage collection operations that have occurred across all task managers. |
Thread Count | threadCount | The total number of live threads used by the application. |
Zeppelin CPU Utilization (%) | zeppelinCpuUtilization | Overall percentage of CPU utilization in the Apache Zeppelin server. |
Zeppelin Heap Utilization (%) | zeppelinHeapMemoryUtilization | Overall percentage of heap memory utilization for the Apache Zeppelin server. |
Zeppelin Thread Count | zeppelinThreadCount | The total number of live threads used by the Apache Zeppelin server. |
Zeppelin Waiting Jobs (Count) | zeppelinWaitingJobs | The number of queued Apache Zeppelin jobs waiting for a thread. |
Zeppelin Uptime (Seconds) | zeppelinServerUptime | The total time that the server has been up and running. |
Consumed KPUs | KPUs | The number of consumed Kinesis Process Units of the the Apache Flink Application. |
Input Source Lag (ms) | MillisBehindLatest | The number of milliseconds the consumer is behind the head of the stream, indicating how far behind current time the consumer is. |
Properties (Attributes)
Cisco Cloud Observability displays the following properties for an Amazon Apache Managed Flink Application.
Display Name | Property Name | Description |
---|---|---|
Name | aws.flink_application.name | The name of the Apache Flink application. |
Region | cloud.region | The geographical region the resource is running. |
Account ID | cloud.account.id | The cloud account ID the resource is assigned to. |
Created At | aws.flink_application.created_at | The timestamp the Apache Flink application was created. |
Service Execution Role | aws.flink_application.service_execution_role | The ARN of the service execution role of the Apache Flink application. |
Status | aws.flink_application.status | The last application status of the Apache Flink application. |
Updated At | aws.flink_application.updated_at | The timestamp the Apache Flink application was last updated. |
Restore Type | aws.flink_application.configuration.restore.type | The restore type of the Apache Flink application. |
Snapshot | aws.flink_application.configuration.restore.snapshot_name | The name of the snapshot to restore the Apache Flink application from. |
Allow Non-Restored State | aws.flink_application.configuration.restore.allow_non_restored_state | Specifies if non-restored state is allowed for the Apache Flink application. |
Checkpoint Type | aws.flink_application.configuration.checkpoint.type | The checkpoint type of the Apache Flink application. |
Checkpoint Enabled | aws.flink_application.configuration.checkpoint.enabled | Specifies if the Apache Flink application checkpoint is enabled. |
Monitoring Type | aws.flink_application.configuration.monitoring.type | The monitoring type of the Apache Flink application. |
Metric Level of Monitoring | aws.flink_application.configuration.monitoring.level.metrics | The metrics level classification of the Apache Flink application. |
Log Level of Monitoring | aws.flink_application.configuration.monitoring.level.log | The log level of the Apache Flink application. |
Parallelism Type | aws.flink_application.configuration.parallelism.type | The parallelism type of the Apache Flink application. |
Maximum Parallelism | aws.flink_application.configuration.parallelism.max | The max parallelism of the Apache Flink application. |
Parallelism Per KPU | aws.flink_application.configuration.parallelism.per_kpu | The parallelism per KPU of the Apache Flink application. |
Autoscaling Enabled | aws.flink_application.configuration.parallelism.autoscaling_enabled | The current autoscaling status of the Apache Flink application. |
Snapshot Enabled | aws.flink_application.configuration.snapshot_enabled | Specifies if snapshot is enabled for the Apache Flink application. |
Cloudwatch Log Option | aws.flink_application.configuration.cloudwatch.log.option | The log option of the Apache Flink application. |
Configuration Mode | aws.flink_application.configuration.mode | The processing mode of the Apache Flink application. |
Zeppelin Log Level | aws.flink_application.configuration.zeppelin.log_level | The log level of the Apache Zeppelin notebook. |
Retention and Purge Time-To-Live (TTL)
For all cloud and infrastructure entities, the retention TTL is 180 minutes (3 hours) and the purge TTL is 525,600 minutes (365 days).
Amazon Web Services, the AWS logo, AWS, and any other AWS Marks used in these materials are trademarks of Amazon.com, Inc. or its affiliates in the United States and/or other countries.