Anomaly Detection is enabled by default for services and service instances in the application. It takes 48 hours for machine learning to model train on the services in your application. Click Configure > Anomaly Detection to view the model training status and other details of all the services. See Model Training.

You can also view the anomaly detection status for a given a service as follows:

  1. From Observe > Application Performance Monitoring > Services, select the service of interest.
  2. Click Anomaly Detection in the Health and Alerting section to view the model training status of the services.

Once all the services are model trained, you can monitor the anomalies associated with an application service and determine the details of the violating metrics. From Observe > Application Performance Monitoring > Services, select the service of interest. The service details page lists the relationship map, the health violations (includes anomaly), and the violating metrics. To view anomaly data, see Examine the Anomaly

Examine the Anomaly

You can view data related to the violating anomaly in the Health Violation timeline and the Health and Alerting section in the right panel. This data helps identify the exact metric that violates and provides associated details that help you take corrective actions.

The Anomaly Detection timeline in the Health Violation section displays a list of violating anomalies associated with the selected service along with the Critical or Warning status. Red indicates Critical status while Yellow indicates Warning status.  The anomaly violation timeline and the start time of the health violation appear at the bottom. If the anomaly is in an open state, the end time of the anomaly violation is the current time. If the anomaly is in a closed state, the start and end time depict the historical time of the anomaly violation. The color-coded status symbol next to Anomaly Detection in the Health and Alerting section indicates the overall status of the anomalies for a given service. A timeline graph of violating metric and suspected cause metric displays for each anomaly. The following image depicts the numbered details of an anomaly:

View Anomaly Details

Select an anomaly on the Anomaly Detection timeline. The following anomaly details display in the right panel:

  • Service name.
  • Violating metric(s).
  • Start date and time of the violation along with the duration.
  • Status of the anomaly (Open, Closed).
  • Suspected causes (lists a maximum of top 3 suspected causes).

    Each suspected cause lists the service name, the ID, the affected entities path, the deviating metric, and an associated call path.

  • Call path.

  • Violating metric(s)/Suspected Cause metric(s) graph with timeline plotted on the X-axis.
  • Other properties like service name and service namespaces.

The following image depicts the numbered details of anomaly data for a service.

The following image depicts the numbered details of anomaly data for a service instance. The following anomaly details display in the right panel:

  • Service instance name
  • Violating metric(s)
  • Start date and time of the violation along with the duration
  • Status of the anomaly (Open, Closed)
  • Other properties like the corresponding service name and service namespaces

Anomaly Data Analysis

The entity centric page and the Properties pane provide the following options to view the data associated with the anomaly and quickly determine the root cause and the affected dependent services:

  • Hover over an anomaly to view the severity level of the anomaly, the start time, its transition to other severity levels (for example, critical to warning), and the end time if the anomaly is resolved.
  • Select an anomaly on the Anomaly Timeline, the top 3 suspected causes list on the right panel in the order of the first suspected cause being the most probable root cause. The Suspected Causes section lists the the call path along with the deviating metric.
  • View the Call path link to trace the propagation of the anomaly (in the service context). A call path lists all the entities that may be affected by the anomaly. 
  • Click an entity on the call path. Other details such as, the dependency flow map for the entity, the endpoints of the service, the associated metric values, also display. These details help determine the source of the anomaly and deduce the affected path.
  • Examine the violating or suspected cause metric timeline graph to correlate the deviating metric data with that of the violating metric. You view, scroll through, and hover over Violating or Suspected cause metrics to determine the deviation. The metric value is shown as a thin blue line. You can hover over a time point to view the metric value in numerical form. 

Anomaly data analysis saves the tedious process of investigating multiple metrics on each dependency to determine the root cause. Instead, you just confirm or negate the listed Suspected Causes with a quick glance at timelines, flow maps, and metrics. Determine the Root Cause of an Anomaly describes root cause analysis with an example.