Understanding Health Rules

AppDynamics collects a wide range of metric information covering your entire application. Some of those metrics are key indicators of the overall state of the system. Being able to proactively track the status of those indicators gives you a window into the health of your system and can inform you when various important entities—nodes, business transactions, databases, etc.—may be in trouble. 

What is a health rule?


Health rules allow you to define acceptable values for key metrics associated with specific entities, and to monitor those essential metrics automatically.  Should metric values exceed the ranges you have specified, the health rule is said to violate, and a health rule violation event occurs and is surfaced in the controller user interface. 

The violation event can also be used to trigger a policy, which can initiate pre-defined actions to respond to the situation, from sending alerting emails to running remedial scripts. To understand how to create policies and actions, see Policies and Actions.

The simplest way to create health rules is to use the basic health rule wizard. The wizard groups commonly used system entities and related metrics to ease the process of setting up your particular system's health rules. Should you need to create health rules that connect less commonly used entities and/or metrics, you can use one of the custom methods in the wizard, the Custom Health Rule Types or the Hybrid methods.  

For specifics on using the health rule wizard, see Configure Health Rules.

Understanding the Health Rule Wizard

To understand how the health rule wizard works, you need to understand four basic concepts:

Heath Rule Types

To simplify creating health rules, the basic health rule wizard groups entities, like nodes or business transactions, with metrics that are commonly associated with those entities, into health rule types.  Doing so allows the wizard to automatically show you relevant information during the health rule creation process.

If your needs are not covered by these types, you can use one of the custom methods in the wizard, Custom Health Rule Types or the Hybrid methods.  

The health rule types are:

If you select one of these health rule types, AppDynamics automatically presents you with a list of the commonly associated metrics, simplifying the health rule creation process by giving you a manageable number of relevant options.

If the types do not cover the entities and/or metrics you wish to use, you can use one of the custom options:

Health Rule Schedules

The metrics associated with a health rule are evaluated according to a schedule that you control. You can configure:

Health Rule Enabled Schedule

By default, health rules are always enabled. But you can also configure your own schedules during which the rule is in effect.
Built-in schedules are:

You can also create a new schedule based on UNIX cron expressions using your custom values.

Health Rule Evaluation Window

Different kinds of metrics may provide better results using different sets of data.  You can manage how much data AppDynamics uses when it evaluates a particular health rule by setting the data collection time period. The default value is 30 minutes.

For metrics based on an average calculation, such as average response time, AppDynamics averages the response time over the evaluation window. A five minute window means that the last five minutes of data is used to evaluate if the health rule is in range. For metrics based on a sum calculation, such as number of calls, AppDynamics uses the total number of calls counted during the evaluation window. And so forth. Use values that work best with the kinds of metrics you are interested in.

Health Rule Wait Time After Violation

The health rule wait time setting lets you control how often an action is taken while the conditions found to violate a health rule continue to violate that health rule.

When the Controller first detects that a health rule's critical or warning condition has been violated, a health rule violation is opened with the corresponding status of Critical or Warning. An Open Critical or Open Warning event is generated to trigger any actions registered on policies that match the health rule and event.

The Controller continues to evaluate the health rule every minute.

If the Controller continues to detect the same health rule critical or warning condition violation, the health rule violation remains open with the same violation status. A corresponding Continues Critical or Continues Warning event may be generated to trigger any actions registered on policies that match the health rule and event.

The health rule's "Wait Time after Violation" setting is used to throttle how often these Continues events are generated for continuing health rule violations. For example, every minute? Or every 60 minutes? The default is every 30 minutes.

In the event that the Controller is unable to evaluate the rule - for example, if a node simply stops reporting - the Evaluation Status of the health rule is marked as a grey question mark or "Unknown".  The current violation event remains open until the "Wait Time after Violation" period has elapsed, at which point the violation event is closed and a new event is triggered, causing the Health status itself of the rule to display as "Unknown". 

Note that the values presented in the Health Rules Violations page for violating health rule conditions are only updated when a health rule violation event is triggered.

To use Continues Critical and Continues Warning events, adjust the default Wait Time after Violation value to the desired frequency on a health rule. Then configure a policy matching that health rule with the "Health Rule Violation Continues - Warning" and/or "Health Rule Violation Continues - Critical" events selected in the "Health Rule Violation Events" section of the policy's settings.

Health Rule Entities

A health rule can evaluate metrics associated with an entire application or a very limited set of entities.  For example, you can create business transaction performance health rules that evaluate certain metrics for all business transactions in the application or node health rules that cover all the nodes in the application or all the nodes in specified tiers. The default health rules are in this category.

You can also create health rules that are very narrowly applied to a limited set of entities in the application, or even a single entity such as a node or a JMX object or an error. For example, you can create a JMX health rule that evaluates the initial pool size and number of active connections for specific connection pools in nodes that are share certain system properties.

The health rule wizard lets you specify precisely which entities the health rule affects, enabling the creation of very specific health rules. For example, for a business transaction you can limit the tiers that the health rule applies to or specific business transactions by name or by names that match certain criteria.

For node health rules, you can specify that a health rule applies only to nodes that meet certain criteria:

Entities Affected by a Health Rule

If you are using the basic health rule wizard and health rule types, AppDynamics provides a list of commonly used sets of entities on which metrics can be evaluated.

For an Overall Application Performance health rule type, the health rule applies the entire application, regardless of business transaction, tier, or node.

For a Business Transaction Performance health rule type, you can apply the health rule to:

For an Error Rates health rule type, you can apply the health rule to:

For a Node Health – Transaction Performance or Node Health – Hardware, JVM, CLR health rule types, you can apply the health rule to:

For a Node Health – JMX health rule type, you select

and apply the health rule to:

For a Databases & Remote Services health rule type, you can apply the health rule to:

For End User Experience – Pages, iframes, and Ajax Requests health rule types, you can apply the health rule to:

For Information Points health rule types, you can apply the health rule to:

Using the Custom health rule type limits you to a single entity hard-coded in the metrics themselves. If you want to use custom metrics but associate them with multiple entities, use the hybrid method.  See Using the Hybrid Method.

Health Rule Evaluation Scope

The health rule evaluation scope defines how many nodes in the affected entities must violate the condition before the health rule is considered violated.

Evaluation scope applies only to business transaction performance type health rules and node health health rules in which the affected entities are defined at the tier level.

For example, you may have a critical condition in which the condition is unacceptable for any node, or you may want to trigger the violation only if the condition is true for 50% or more of the nodes in a tier. 

Options for this evaluation scope are:

Health Rule Conditions

You define the acceptable range for a metric by establishing health rule conditions. A health rule condition defines what metric levels constitute a Warning status and what metric levels constitute a Critical status.

A condition consists of a Boolean statement that compares the current state of a metric against one or more static or dynamic thresholds based on a selected baseline. If the condition is true, the health rule violates.  The rules for evaluating a condition using multiple thresholds depend on configuration.

Static thresholds are straightforward. For example, is a business transaction's average response time greater than 200 ms?

Dynamic thresholds are based on a percentage in relation to, or a standard deviation from, a baseline built on a rolled-up baseline trend pattern.  For example, a daily trend baseline rolls up values for a particular hour of the day during the last thirty days, whereas a  weekly trend baseline rolls up values for a particular hour of the day, for a particular day of the week, for the last 90 days.  For more information about baselines, see Detect Anomalies Using Dynamic Baselines.

You can define a threshold for a health rule based on a single metric value or on a mathematical expression built from multiple metric values.

The following are typical conditions:

The last example combines two metrics in a single condition. You can use the expression builder in the health rules wizard to create conditions based on a  complex expression comprising multiple interdependent metrics

Often a condition consists of multiple statements that evaluate multiple metrics. A health rule is violated either when one of its condition evaluates to true or when all of its conditions evaluate to true, depending on how it is configured. You can correlate multiple metrics to focus the health rules for your environment.

For example, a health rule that measures response time (average response time greater than some baseline value) makes more business sense if it is correlated with the application load (for example, 50 concurrent users or 10,000 calls per minute) on the system. You may not want to use the response time condition alone to trigger a policy that initiates a remedial action if the load is low, even if the response time threshold is reached. To configure this correlation, the first part of the condition would evaluate the actual performance measurement and the second part would ensure that the health rule is violated only when there is sufficient load.

Critical and Warning Conditions

Conditions are classified as either critical or warning conditions. 

Critical conditions are evaluated before warning conditions. If you have defined a critical condition and a warning condition in the same health rule, the warning condition is evaluated only if the critical condition is not true.

The configuration procedures for critical and warning conditions are identical, but you configure these two types of conditions in separate panels.  You can copy a critical condition configuration to a warning configuration and vice-versa and then adjust  the metrics in the copy to differentiate them. For example, in the Critical Condition panel you can create a critical condition based on the rule:

Then from the Warning Condition panel, copy that condition and edit it to be:

As performance changes, a health rule violation can be upgraded from warning to critical if performance deteriorates to the higher threshold or downgraded from critical to warning if performance improves to the warning threshold.

Default Health Rules

Out of the box, AppDynamics provides a default set of health rules:

Health Rule Name

Health Rule Type

Business Transaction response time is much higher than normal

Business Transaction Performance

Business Transaction error rate is much higher than normal

Business Transaction Performance

CPU utilization is too high

Node Health – Hardware, JVM, CLR Performance

Memory utilization is too high

Node Health – Hardware, JVM, CLR Performance

JVM Memory Heap  is too high

Node Health – Hardware, JVM, CLR Performance

JVM Garbage Collection Time is too high

Node Health – Hardware, JVM, CLR Performance

CLR Garbage Collection Time is too high

Node Health – Hardware, JVM, CLR Performance

If any of these predefined health rules are violated, the affected items are marked in the UI as yellow-orange, if it is a Warning violation and red, if it is a Critical violation.

In many cases the default health rules may be the only health rules that you need. If the conditions are not configured appropriately for your application, you can edit them. You can also disable the default health rules. 

Preparing to Set Up Health Rules

AppDynamics recommends the following process to set up health rules for your application:

  1. Identify the key metrics on the key entities that you need to monitor for your application. See Suggested Metrics for Additional Health Rules for some common choices.
    These metrics should be representative of the overall health of your application.
  2. Click Alert & Respond -> Health Rules to examine the default health rules that are provided by AppDynamics.
  3. If the health rules do not cover all your needs or if you need very finely-applied health rules to cover specific use cases, create new health rules.
  4. Create schedules for health rules, if needed.
    In some situations a health rules is more useful if it runs at a particular time. See Health Rule Schedules.
  5. If desired, configure policies and actions that should come into play when health rules are violated. See Policies and Actions.

Health Rule Management

To view current health rules in an application, including the default health rules, and to access the health rule wizard, click Alert & Respond -> Health Rules.

Current health rules are listed in the left panel.  If you click one of these rules, a list appears in the right panel showing what entities this selected health rule affects and what the status of the latest evaluation is.  You can also select the Evaluation Events tab to see a detailed list of evaluation events.

In the left panel you can directly delete or duplicate a health rule. From here you can also access the health rule wizard to add a new rule or edit an existing one.

To delete an existing health rule:

  1. Select the health rule in the left panel.
  2. Click the minus icon.

The health rule is removed.

To duplicate an existing health rule:

  1. Select the health rule in the left panel.
  2. Click the copy icon.

The health rule is duplicated under the name you assign.

To edit an existing health rule:

  1. Select the health rule in the left panel.
  2. Click the pencil icon. 

The health rule wizard appears, with the rule's current values configured. You can modify these values in the wizard.

To create a new health rule:

  1. Click the plus icon.
  2. The health rule wizard appears with some default values configured. You define the health rule in the wizard.

See Configure Health Rules for details on using the health rule wizard.

Suggested Metrics for Additional Health Rules

The following are suggested metrics you might want to monitor with health rules, based on what many customers have found useful.

Additional Metrics for Business Transactions

Additional Metrics for Tiers

Additional Metrics for Nodes

(info) Some of these metrics might apply only to certain types of nodes, such as those running JVMs.

Additional Metrics for Backends

Learn More