Anomaly Detection

Related Pages:

Anomaly Detection and Automated Root Cause Analysis are two features designed to reduce Mean Time To Resolution (MTTR) for application performance problems.

It automatically determines whether the following entities are performing normally:

business transactions of your applications
application servers
base pages for browser applications
network requests for mobile applications

Automated Root Cause Analysis helps you quickly determine the root cause for the business transaction anomalies.

How Does Anomaly Detection Work?

Anomaly Detection uses machine learning capabilities to reduce the Mean Time to Detect (MTTD) when an anomaly occurs in a business transaction, base pages, and network requests. It uses a specially designed algorithm that does not require you to configure anything.

The Anomaly Detection algorithm works as follows:

Entity	Monitored Metrics
Application Servers	The Anomaly Detection algorithm detects any abnormal readings reported for the CPU utilization and Memory utilization metrics.
Business transactions	The Anomaly Detection algorithm detects if any abnormal reading is reported for the Errors per minute (EPM) metric. It detects if any abnormal reading is reported for the Average Response time (ART) metric. It then combines the data it learned from these metric readings using heuristics that are designed to reduce alert noise.
Base pages for browser applications	The Anomaly Detection algorithm detects any abnormal readings reported for End User Response Time.
Network requests for mobile applications	The Anomaly Detection algorithm detects any abnormal readings reported for Network Request, HTTP errors per minute, and network errors per minute.

Anomaly Detection employs multiple techniques to ensure that the metric data it collects is accurate:

It disregards any temporary spikes and periods of no data.
It normalizes the metric data. For example, when determining the EPM metric data, any spikes may not indicate a real problem unless there is a corresponding increase in Calls per Minute (CPM). EPM data may not be useful in itself, hence, Anomaly Detection uses Error Rate (EPM/CPM).
It does not apply traditional seasonal baselines. Instead, it correlates the variance of EPM and ART to CPM to obtain reliable results.

Correlation of EPM and CPM Variance

What is Root Cause Analysis (RCA)?

When an entity in your application has an anomaly, you will want to know why. Anomaly Detection uses AI capabilities to enable Automated Root Cause Analysis to monitor the health of all the entities in your application, and show you the suspected causes for every anomaly. You can confirm or negate the suspected causes with a brief look, and drill down into deviating metrics and snapshots, as desired. Thus, you can quickly determine the root cause of any anomaly in the application.

How Does RCA Work?

RCA reduces the Mean Time to Identification (MTTI) by automatically pointing to the source of the problem. RCA considers metrics to identify the fault domain, and snapshots and events from the entire application, to find and surface suspected problems. This holistic approach is performed in two parts:

Fault domain isolation—Identification of the exact location of the problem in the system
Impacted component analysis—Analysis of logs, snapshots, trace, infrastructure, and so on to determine the affected components

How is Anomaly Detection Different from Health Rules?

While both Anomaly Detection and health rules alert you to performance problems in your application, Anomaly Detection provides powerful insights that would be difficult to obtain using health rules.

Anomaly Detection	Health Rules
Anomaly Detection uses machine learning to discover the normal ranges of key business transaction metrics, base pages, and network request. It alerts you when these metrics deviate significantly from expected values. This enables Anomaly Detection to identify a wider range of problems than a person could capture in Health Rules.	Health rules are manually created to apply logical conditions that one or more metrics must satisfy. For example, you could monitor the Average Response Time (ART) to check if this metric deviates from the configured baseline.
Anomaly Detection requires no configuration except when you want to limit anomaly alerting.	Splunk AppDynamics provides a default set of health rules and you create additional health rules manually as required, configuring time periods, trends, and schedules.
Anomalies are associated with application servers, business transactions, base pages for browser applications, and network request for mobile applications.	Health rules apply to any entity, for example, business transactions, service endpoints.