AppDynamics Cloud Anomaly Detection automatically determines whether every service in your application performs within the acceptable performance limits. This helps reduce the Mean Time To Detection (MTTD) for application performance problems. 

 How Does Anomaly Detection Work?

Anomaly Detection uses machine learning capabilities to continuously monitor latency, error and throughput of services to identify abnormal behavior. It uses an algorithm that does not require any manual configuration. The Anomaly Detection algorithm works as follows:

  1. It detects if any abnormal reading is reported for the Errors per minute (EPM) metric. 

  2. It detects if any abnormal reading is reported for the Average Response time (ART) metric. 

  3. It then combines the data it learned from these metric readings using heuristics that are designed to reduce alert noise.

Anomaly Detection employs multiple techniques to ensure that the metric data it collects is accurate:

  • It disregards any temporary spikes and periods of no data.
  • It normalizes the metric data. For example, when determining the EPM metric data, any spikes may not indicate a real problem unless there is a corresponding increase in Calls per Minute (CPM). EPM data may not be useful in itself, hence, Anomaly Detection uses Error Rate (EPM/CPM).



  • It does not apply traditional, seasonal baselines. Instead, it correlates the variance of EPM and ART to CPM to obtain reliable results.

Suspected Causes and Root Cause Analysis (RCA)

There are many reasons why an entity in your application has an anomaly. Anomaly Detection uses AI capabilities to enable RCA to show you the suspected causes for every anomaly. You can review the suspected cause(s), corresponding deviating metrics, and call path from suspected cause(s) to the impacted entity. Suspected causes are ranked in the order of likelihood; hence, you can start your analysis with most likely suspected cause. This reduces the Mean Time to Resolution (MTTR).

How is Anomaly Detection Different from Health Rules?

While both Anomaly Detection and health rules alert you with performance problems in your application, Anomaly Detection provides powerful insights that would be difficult to obtain using health rules.

Anomaly DetectionHealth Rules

Anomaly Detection uses machine learning to discover the normal ranges of key application entity performance metrics and alerts you when these metrics deviate significantly from expected values. This enables Anomaly Detection to identify a wider range of problems than a person could capture in Health Rules.

Health rules are manually created to apply logical conditions that one or more metric must satisfy. For example, you could monitor the ART to check if this metric deviates from the configured baseline.

Anomaly Detection requires no configuration.AppDynamics Cloud provides a default set of health rules. You create additional health rules manually as required, configuring time periods, and trends.

Anomalies are associated with application services.

Health rules apply to any entity, for example, clusters, services, and pods.

Model Training

Anomaly Detection is enabled by default for all services in the application. It takes 48 hours for Anomaly Detection to become available for your services. During that time, the machine learning models train on the services in your application.

Select Configure > Anomaly Detection to view the model training status for the services. The following table explains the training statuses of a service:

Status

Meaning

In Progress

AppDynamics Cloud machine learning has started receiving data for an entity and model creation is in progress.

ReadyModel training is complete and AppDynamics Cloud is ready to detect anomalies.
Unknown The current status of the model is unknown. This happens when AppDynamics Cloud machine learning has just started receiving data for an entity but the model does not exist or when it does not receive any data for the given entity.
Not AvailableAppDynamics Cloud machine learning does not receive any data.

The models are updated continuously. If traffic to a service is interrupted for a long duration, preventing the training that day, Anomaly Detection uses the models from the previous seven days.

View Anomaly Data

Once the model training is complete, you can view the detected anomalies, monitor them, and take corrective actions. See Monitor Anomalies.