Health Rules

Related pages:

This page provides an overview of health rules and the policy statements that define triggers in AppDynamics policies.

What is a Health Rule?

Health rules let you specify the parameters that represent what you consider normal or expected operations for your environment. The parameters rely on metric values, for example, the average response time for a business transaction or CPU utilization for a node.

When the performance of an entity affected by the health rule violates the rule's conditions, a health rule violation occurs. The health statuses are represented as critical, warning, normal, and unknown.

When the health status of an entity changes, a health rule violation event occurs. Examples of a health rule violation include:

Starting
Ending
Upgrading from warning to critical or
Downgrading from critical to warning

The health statuses of entities and health rule violations are surfaced in the controller user interface. A health rule violation event can also be used to trigger a policy, which can initiate automatic actions, such as, sending alerting emails or running remedial scripts.

You create health rules using the health rule wizard, described in Configure Health Rules. The wizard groups commonly-used system entities and related metrics to simplify setting up health rules. You can also use the default health rules provided by AppDynamics as-is, or modify them.

Default Health Rules

AppDynamics provides a default set of health rules for some products, such as applications and servers. These default health rules vary depending on the entity. To see the default rules, before any health rules have been added to your AppDynamics installation:

From the Alert & Respond tab, click Health Rules.
Select the entity.
The default health rules for the selected entity are displayed.

If any of these predefined health rules are violated, the affected entities are marked in the UI as yellow-orange if it is a Warning violation and red if it is a Critical violation.

In many cases, the default health rules may be the only health rules that you need. You can edit and customize the health rules to suit your application. You can also disable the default health rules.

Health Rule Scopes

The health rule scope determines the set of default health rule types. You can choose the scope to get a set of default health rule types for applications, servers, or databases. For example, when you define a mobile application as the scope, the default health rules such as crash rates and HTTP/network error rates are displayed. Similarly, if you define the health rule scope for an application, the health rules would be for business transactions, CPU/memory utilization, and so on.

From Alert & Respond > Health Rules, you can select one of the following health rule scopes from the drop-down list:

Applications
User Experience: Browser Apps
User Experience: Mobile Apps
User Experience: API Monitoring
Databases
Servers
Analytics

You can also create new health rules to add to the default set for each scope. You may want to add the health rule app starts to your mobile application. This health rule is not part of the default set of health rules in the mobile app scope, so you would just need to add a new health rule.

Heath Rule Types

The health rule wizard groups health rules into types that are categorized by the entity that the health rule covers. This allows the wizard to display appropriate configuration items during the health rule creation.

The health rule types are:

Transaction Performance
- Overall Application Performance: Groups metrics related to load, response time, slow calls, stalls, with applications.
- Business Transaction Performance: Groups metrics related to load, response time, slow calls, stalls, so on, with business transactions.
Node Health
- Node Health-Hardware, JVM, CLR: Groups metrics like CPU and heap usage, disk I/O, so on, with nodes.
- Node Health-Transaction Performance: Groups metric related to load, response time, slow calls, stalls, so on, with nodes.
- Node Health-JMX: Java only, groups metrics related to connection pools, thread pools, so on, with specific JMX instances and objects in specific nodes and tiers.
User Experience-Browser Apps
- Pages: Groups metrics like DOM building time, JavaScript errors, so on, with the performance of application pages for the end-user.
- IFrames: Groups metrics like first-byte time, requests per minute, so on, with the performance of iFrames for the end-user.
- AJAX Requests: Groups metrics like Ajax callback execution time, errors per minute, so on, with the performance of Ajax requests for the end-user.
- Virtual Pages: Groups metrics like End User Response Time, Digest Cycles, HTML Download Time, DOM Building Time, etc. for virtual pages created with Angular. See AngularJS Support for information on what these metrics mean in the context of virtual pages.
User Experience-Mobile Apps
- Mobile Apps: Groups metrics related to mobile app crashes, starts, and server calls as well as network requests and errors.
- Network Requests: Groups metrics like HTTP and network errors, request time, and requests per minute with network requests.
Servers: Groups metrics related to hardware resources.
Databases & Remote Services: Groups metrics related to response time, load, or errors with databases and other backends.
Advanced Network: Groups metrics related to Network Visibility, such as PIE (performance impact events), zero window, data retransmission, and errors.
Error Rates: Groups metrics related to exceptions, return codes, and other errors with applications or tiers.
Information Points: Groups metrics like response time, load, or errors with information points.
Service Endpoints: Java and .NET only; groups metrics like average response time, calls per minute, and errors per minute with service endpoints.
Custom: Presents all the metrics collected by the agent that could affect a single business transaction, a single node or overall application performance. Use this type to create rules that evaluate custom metrics.

When you select one of these health rule types, the wizard offers you the metrics commonly associated with that type in an embedded browser.

How to Set Up Health Rules?

AppDynamics recommends the following process to set up health rules for your application:

Identify the key metrics (performance indicators) on the key entities that you need to monitor.
Click Alert & Respond > Health Rules to examine any default health rules that are provided by AppDynamics.
- Compare your list of metrics with the metrics configured for the default rules.
- You can view the list of affected entities for each of the default health rules and modify them. See Configure Affected Entities.
- If the default health rules cover all the key metrics you need, determine if the pre-configured conditions are applicable to your environment. If required, modify the conditions.
  
  Define a metric expression to evaluate complex criteria for a condition.
  Define a boolean expression to evaluate multiple health rule conditions.
If default health rules do not cover all your requirements or if you need finely-applied health rules to cover specific use cases, create new health rules.
1. Identify the type of health rule that you want to create. See Health Rule Types.
2. Decide which entities are affected by the new rule. See Entities Affected by a Health Rule.
3. Define the conditions to monitor. See Create and Configure Conditions.
If you want the health rules to be evaluated according to a pre-defined time schedule, create a health rule schedule. In some situations, a health rule is more useful if it is evaluated at a particular time. See Health Rule Schedules.

After you set up health rules you must configure policies and actions to be executed when health rules are violated. See Policies and Actions.

Additional Considerations

Your application status is based on health rules for the current time range. If you disable old health rule policies or enable new ones, you might see errors in red in your application status, even if there are no current critical events based on the new policies. To verify that your new or disabled health rule policies have taken effect, change the time range in your dashboard to a smaller, more recent time frame.

When you are configuring health rules for business transactions with baseline selected in the configured condition with a very fast average response time (ART) such as 25 ms, using standard deviation as a criterion can cause the health rule to be violated too frequently. The health rule may violate too frequently because a tiny increase in response time can represent multiple standard deviations. In this case, consider adding a second condition that sets a minimum ART as a threshold. For example, if you do not want to be notified unless ART is over 50 ms, you could set your threshold as ART > 2 Standard Deviations and ART > 50 ms.

Similarly, when configuring health rules for calls-per-minute (CPM) metrics with baseline selected in the configured condition, the health rule may never be violated if the condition is using standard deviations, and the resulting value is below zero. In this case, consider adding a second condition that checks for a zero value, such as CPM < 2 Standard Deviations and CPM < 1.