Configure Anomaly Detection

You can configure Anomaly Detection for the entity types of your applications and infrastructure for automatic detection of performance issues. This feature enables you to easily detect the performance issues without having any prior experience in writing complex evaluation conditions as in health rules. Once configured, Anomaly Detection uses machine learning capabilities and automatically determines whether the specified entities in your application perform within the acceptable performance limits.

With this feature, you can:

Filter specific entities by tags and attributes for which you want to configure Anomaly Detection.
Link HTTP request actions as per your choice and get automated response when performance deviates from the acceptable limits.
Choose a sensitivity level (High, Medium, or Low) of the Anomaly Detection algorithm based on your business needs.
Test the Anomaly Detection configuration for the entities that are in your development or staging environments.

How to create a configuration

To configure Anomaly Detection:

Click Configure > Anomaly Detection.
Click Create configuration to open the configuration wizard.

Alternatively, you can configure Anomaly Detection for an entity type from the Observe page. Perform the following:

Click Observe.
Go to one of the following domains:
- Application Performance Monitoring
- Infrastructure
- Kubernetes

Click an entity type of the domain:

Domain	Entity Type
Application Performance Monitoring	Services, Service Instances, Service Endpoints, or Business Transactions
Infrastructure	Hosts Anomaly Detection is supported only for AWS hosts.
Kubernetes	Cluster, Namespace, Pods, or Workloads

Click an Entity Name to view its details.

For Application Performing Monitoring domain, click List to view the list of entity names.
Under the HEALTH AND ALERTING section, click Anomaly Detection.
Click Create configuration to open the configuration wizard corresponding to the selected entity type.

The configuration process involves the following three steps:

Select entities and detection sensitivity
Link actions
Review the settings

Select Entities and Detection Sensitivity

You can configure Anomaly Detection for the following domains and their entity types:

Domain Entity Types Monitored Metrics

Application Performance Monitoring

Business Transactions
Services
Service Endpoints
Service Instances

Metric	Description
`Average Response Time`	The time each request must wait to be granted a global resource added together for all requests and then divided by the total number of requests; nanoseconds is converted to milliseconds.
`Call Per Minute`	The number of calls reported during one minute.
`Errors Per Minute`	The number of errors reporting in one minute.

Infrastructure

AWS EC2

Metric	Description
`CPU Used Utilization`	The percentage of time the CPU was busy processing system or user requests.
`Disk Avg IO Utilization`	The average time spent processing read and write requests on all disks and partitions as a percentage of the total reported time period. Databases often report high disk I/O utilization due to frequent read/write requests.
`Memory Used Utilization`	The amount of memory used by applications.
`Network Incoming Errors/min`	The number of incoming packet errors the network incurs every minute.
`Network Incoming Packets Dropped`	The number of incoming data packets per second dropped by all monitored network devices.
`Network Outgoing Errors/min`	The number of outgoing packet errors the network incurs every minute.
`Network Outgoing Packets Dropped`	The number of outgoing data packets per second dropped by all monitored network devices.
`Page Faults/sec`	The number of page faults per second for the system.

AWS Application Load Balancer

Metric	Description
`Target Response Time`	The time elapsed after a request leaves the load balancer until it receives a response from the target.
`Target Connection Errors`	The number of connections that were not successfully established between the load balancer and target.

AWS Classic Load Balancer

Metric	Description
`Backend Response Time`	The total time elapsed from the time the load balancer sent the request to a registered instance until the instance started to send the response headers.
`Backend Connection Errors`	The number of failed connections between the load balancer and the registered instances.
`Surge Queue Length`	The total number of requests (HTTP listener) or connections (TCP listener) that are pending routing to a healthy instance.

Kubernetes

Cluster

Metric	Description
`CPU Used`	The total CPUs used in a cluster.
`CPU Requests`	The total CPU requests from the pods in a cluster.
`Memory Used`	The total memory used by the pods in a cluster.
`Memory Requests`	The total memory requested by the pods in a cluster.
`Memory Pressure`	The amount of memory pressure experienced on nodes due to decrease in available memory.
`Disk Pressure`	The amount of disk pressure experience on nodes due to decrease in available disk space.
`Pods in Pending State`	The pods in a cluster are in the pending state because they cannot be scheduled to a node due to a shortage of resources.
`Pods in Failed State`	The pods in the cluster are in the failed state because of some errors.
`Pods in Unknown State`	The pods in the cluster are in the unknown state because the node on which they are running becomes unresponsive, disconnected, or experiences other issues.

Namespace

Metric	Description
`CPU Used`	The total CPUs used in a namespace.
`CPU Requests`	The total CPU requests from the pods in a namespace.
`Memory Used`	The total memory used by the pods in a namespace.
`Memory Requests`	The total memory requested by the pods in a namespace.
`Pods in Pending State`	The pods in a namespace are in the pending state because they cannot be scheduled to a node due to a shortage of resources.
`Pods in Failed State`	The pods in a namespace are in the failed state because of some errors.
`Pods in Unknown State`	The pods in the namespace are in the unknown state because the node on which they are running becomes unresponsive, disconnected, or experiences other issues.

Workload

Metric	Description
`CPU Used`	The total CPUs used in a workload.
`CPU Requests`	The total CPU requests from a workload.
`Memory Used`	The total memory used by a workload.
`Memory Requests`	The total memory requested by a workload.

Pod

Metric	Description
`CPU Used`	The total CPUs used in a pod.
`CPU Requests`	The total CPU requests from a pod.
`Memory Used`	The total memory used by a pod.
`Memory Requests`	The total memory requested by a pod.

In Step 1 of the wizard, perform the following:

Select a domain:
- Application Performance Monitoring
- Infrastructure
- Kubernetes
In Selected Entities , select an entity type.

The entity type is preselected if you have already chosen it from the Observe page.
In the Filter section, enter a filter expression by using the tags and attributes to narrow down specific entity type.
The attributes and tags are auto-populated based on the entity type that you have selected. You can use the attributes and the tags to configure Anomaly Detection for specific entity names, entity types, and so on. For example, you can select the entity type Service and enter the following filter expression to configure Anomaly Detection for the particular criteria:
attributes(service.name) = 'test' && attributes(status) IN [Normal]
For more information about the supported filter operations, see Filters.

In the Detection Sensitivity section, select one of the following sensitivity levels:

Sensitivity level	Description
`High`	Use this level for business-critical services to ensure that no issue gets undetected in your environment. It triggers more alerts but with lower statistical confidence.
`Medium`	Use this level for services that are important to your business but not critical. By default, this sensitivity level is selected.
`Low`	Use this level for services that have low business impact and to avoid too many alerts.

Click Next to link HTTP request actions.

Link Actions

In Step 2 of the wizard, you can view the HTTP request actions available in your Cisco Cloud Observability Tenant and link it with the Anomaly Detection configuration. If you want to link a new HTTP request action, you need to first create it. See Create HTTP Request Action.

To link an HTTP request action:

Click +Add.
In the HTTP Action section:
1. Select an action from the list.
2. Select a trigger from the list. You can select multiple trigger events based on which the action will be triggered.
  
  The Preview pane on the right displays mock data that the HTTP request contains. It does not display the request header; however, the actual request includes the header.
Click +Add and repeat step 2.a and 2.b to link multiple actions.
Click Next to review the settings.

Review the Settings

In Step 3 of the wizard, specify the following details to complete the configuration:

Enter a name for the Anomaly Detection configuration.
(Optional) Deselect Turn on this configuration to disable it after creating the configuration. By default, this option is enabled. It is recommended to keep it enabled so that you receive automated response when performance issues are detected in the monitored metrics.
Select one of the following options to evaluate the health of the entity when no metric data is available for evaluation:
- Unknown: The anomaly detection algorithm considers the health of the entity for a no data scenario as unknown and the health status of the entity is shown as Grey.
- Healthy: The anomaly detection algorithm considers the health of the entity for a no data scenario as healthy and the health status of the entity is shown as Green.
(Optional) If you want to test the configuration, select Yes, turn on test mode.

Test mode allows you to assess anomaly detection capabilities in non-production environments. In this mode, the anomaly detection accurately detects any performance issues even if metric data collection is low. You can use the test mode in your development or staging environments.
Click Submit to save the configuration.

The configuration applies to all the monitored entities of the specified entity type unless you have defined a filter criteria in Step 1 of the wizard.

View the Configurations

The Configure > Anomaly Detection page displays the list of Anomaly Detection configurations available in your Cloud Tenant. The list contains both the default set of configurations and the user-defined configurations. You can update, delete, or disable any configuration as per your requirements.

Disabling or deleting the Anomaly Detection configurations for the entities affect the root cause analysis functionality. To use the root cause analysis functionality, always keep Anomaly Detection enabled for all the entities in the call path.