Health rules establish the health status of an entity by defining levels of performance based on metrics; for example, average response time (for a business transaction) or CPU utilization (for a node) is too high.
The health statuses are: critical, warning, normal, and unknown. When the performance of an entity affected by the rule violates the rule's conditions, a health rule violation exists.
When the health status of an entity changes, a health rule violation event occurs. Examples of health rule violation events are a health rule violation starting, ending, upgrading from warning to critical, or downgrading from critical to warning.
The health statuses of entities and health rule violations are surfaced in the controller user interface. A health rule violation event can also be used to trigger a policy, which can initiate automatic actions, such as sending alerting emails or running remedial scripts.
You create health rules using the health rule wizard, described in Configure Health Rules. The wizard groups commonly-used system entities and related metrics to simplify setting up health rules. You can also use, as is or modified, the default health rules provided by AppDynamics.
This topic provides some background on the terminology used in the health rule wizard.
The health rule scope determines the set of default health rule types. You can choose the scope to get a set of default health rule types for applications, servers, or databases. For example, when you choose a mobile application as the scope, you're given health rules such as crash rates and HTTP/network error rates.
If the health rule scope is for an application, the health rules would be for business transactions, CPU/memory utilization, etc.
From Alert & Respond > Health Rules, you can select one of the following health rule scopes from the drop-down menu:
You can also create new health rules to add to the default set for each scope. You may want to add the health rule app starts to your mobile application. This health rule is not part of the default set of health rules in the mobile app scope, so you would just need to add a new health rule.
The health rule wizard groups health rules into types that are categorized by the entity that the health rule covers. This allows the wizard to display appropriate configuration items during the health rule creation process.
The health rule types are:
When you select one of these health rule types, the wizard offers you the metrics commonly associated with that type in an embedded browser.
The metrics associated with a health rule are evaluated according to a schedule that you control. You can configure:
Time evaluation for health rule schedules is based on the time zone of the Controller, regardless of where an app agent is situated. For example, if a Controller is in San Francisco but the app agent is in Dubai, Pacific Time applies to the health rule schedule.
All SaaS Controllers use Pacific Time (PT).
By default, health rules are always enabled.
Built-in schedules are:
You can also configure your own schedules based on UNIX cron expressions using custom values.
The health rule evaluation window is the period of time over which the data used to evaluate the health rule is collected.
Different kinds of metrics may provide better results using different sets of data. You can manage how much data AppDynamics uses when it evaluates a particular health rule by setting the data collection time period. The default value is 30 minutes.
The health rule wait time setting lets you control how often an event is generated while the conditions found to violate a health rule continue. If the Controller determines that a health rule has been violated, with a status of either Critical or Warning, an Open Critical or Open Warning event is generated. That event can be used to trigger any policies that match that the health rule, and thus to initiate any actions that the policies require.
Once an Open event has occurred, the Controller continues to evaluate the status of the health rule every minute. If the Controller continues to detect the same violation, the violation remains open with the same status. A corresponding Continues Critical or Continues Warning event may be generated to link to any related policies.
But a Continues event every minute might be too noisy for your situation. The health rule's Wait Time after Violation setting is used to throttle how often these Continues events are generated for continuing health rule violations. The default is every 30 minutes.
To use Continues Critical and Continues Warning events, adjust the default Wait Time after Violation value to the desired frequency. Then configure a policy matching that health rule with the "Health Rule Violation Continues - Warning" and/or "Health Rule Violation Continues - Critical" events selected in the "Health Rule Violation Events" section of the policy's settings.
Note that the violations displayed in the Health Rules Violations page (under Troubleshoot) are updated only when a health rule violation event is triggered.
If the Controller is unable to evaluate the rule, for example if a node simply stops reporting, the Evaluation Status of the health rule is marked as a grey question mark or "Unknown" in the Current Evaluation Status tab in the right panel of the health rules list. The current violation event remains open until the Wait Time after Violation period has elapsed, at which point the violation event is closed and a new event is triggered, causing the Health status itself of the rule to display as "Unknown".
A health rule can evaluate metrics associated with an entire application or a limited set of entities. For example, you can create business transaction performance health rules that evaluate certain metrics for all business transactions in the application or node health rules that cover all the nodes in the application or all the nodes in specified tiers. The default health rules are in this category.
You can also create health rules that are narrowly applied to a limited set of entities in the application, or even a single entity such as a node or a JMX object or an error. For example, you can create a JMX health rule that evaluates the initial pool size and number of active connections for specific connection pools in nodes that are share certain system properties.
The health rule wizard lets you specify precisely which entities the health rule affects, enabling the creation of very specific health rules. For example, for a business transaction you can limit the tiers that the health rule applies to or specific business transactions by name or by names that match certain criteria.
For node health rules, you can specify the type of the node (Java, .NET, PHP etc.)
You can specify that a health rule applies only to nodes that meet certain criteria:
Note that the Type of Node pulldown menu does not allow you to specify Node.js, Python,m of Web Service nodes. To restrict a health rule to these types of nodes, you can specify the affected entity as a tier and then select only Node.js or Python or Web Service tiers as needed. Or to more finely-tune the affected nodes, use the Nodes matching the following criteria menu item to specify node names or matching environment variables or meta-info to restrict the health rule to the nodes you want.
For an Overall Application Performance health rule type, the health rule applies to the entire application, regardless of business transaction, tier, or node.
If you configure your Health Rule to work with tiers you must also configure the parallel policy to work with tiers. However, if you configure your Health Rule to work with tiers, but your policy is configured with nodes first, you will not trigger any actions or notifications. The inverse is also true. The following screenshots show the example of a policy and a health rule created in the correct order.
For a Business Transaction Performance health rule type, you can apply the health rule to:
For a Node Health – Transaction Performance or Node Health – Hardware, JVM, CLR health rule types, you can apply the health rule to:
For a Node Health – JMX health rule type, you can apply the health rule to:
For User Experience - Browser Apps (Pages, iframes, Ajax Requests, Virtual Pages) health rule types, you can apply the health rule to:
For User Experience - Mobile Apps, you can apply the health rule to
All mobile apps with the specified app key
The specified mobile apps
Mobile apps matching the given criteria
For User Experience - Mobile Network Requests, you can apply the health rule to
For Server health rule types, you can apply the health rule to:
For a Databases & Remote Services health rule type, you can apply the health rule to:
For an Error Rates health rule type, you can apply the health rule to:
For Information Points health rule types, you can apply the health rule to:
For Service Endpoint health rule types, you can apply the health rule to:
For a Custom health rule type, you can apply the health rule to:
You define the acceptable range for a metric by establishing health rule conditions. A health rule condition sets the metric levels that constitute a Warning status and a Critical status.
A condition consists of a Boolean statement that compares the current value of a metric against one or more static or dynamic thresholds based on a selected baseline. If the condition is true, the health rule violates. The rules for evaluating a condition using multiple thresholds depend on configuration.
Static thresholds are straightforward. For example, is a business transaction's average response time greater than 200 ms?
Dynamic thresholds are based on a percentage in relation to, or a standard deviation from, a baseline built on a rolled-up baseline trend pattern. A daily trend baseline rolls up values for a particular hour of the day during the last thirty days, whereas a weekly trend baseline rolls up values for a particular hour of the day, for a particular day of the week, for the last 90 days. For more information about baselines, see Dynamic Baselines.
You can define a threshold for a health rule based on a single metric value or on a mathematical expression built from multiple metric values.
The following are typical conditions:
The last example combines two metrics in a single condition. You can use the expression builder embedded in the health rules wizard to create conditions based on a complex expression comprising multiple interdependent metrics.
Often a condition consists of multiple statements that evaluate different metrics. A health rule is violated either when one of its condition evaluates to true or when all of its conditions evaluate to true, depending on how the condition is configured.
For example, a health rule that measures response time (average response time greater than some baseline value) makes more business sense if it is correlated with the application load (for example, 50 concurrent users or 10,000 calls per minute) on the system. You may not want to use the response time condition alone in a policy that initiates a remedial action if the load is low, even if the response time threshold is reached. The first part of the condition would evaluate the response time performance measurement and the second part would ensure that the health rule is violated only when there is sufficient load:
The health rule evaluation scope defines how many nodes in the affected entities must violate the condition before the health rule is considered violated.
Evaluation scope applies only to business transaction performance type health rules and node health type health rules in which the affected entities are defined at the tier level.
For example, you may have a critical condition in which the condition is unacceptable for any node, or you may want to consider the condition a violation only if the condition is true for 50% or more of the nodes in a tier.
Options for this evaluation scope are:
Conditions are classified as either critical or warning conditions.
Critical conditions are evaluated before warning conditions. If you have defined a critical condition and a warning condition in the same health rule, the warning condition is evaluated only if the critical condition is not true.
The configuration procedures for critical and warning conditions are identical, but you configure these two types of conditions in separate panels. You can copy a critical condition configuration to a warning configuration and vice-versa and then adjust the metrics in the copy to differentiate them. For example, in the Critical Condition panel you can create a critical condition based on the rule:
Then from the Warning Condition panel, copy that condition and edit it to be:
As performance changes, a health rule violation can be upgraded from warning to critical if performance deteriorates to the higher threshold or downgraded from critical to warning if performance improves to the warning threshold.
AppDynamics provides a default set of health rules for some products, such as applications and servers. These default health rules vary depending on the entity. To see the default rules, before any health rules have been added to your AppDynamics installation:
If any of these predefined health rules are violated, the affected items are marked in the UI as yellow-orange if it is a Warning violation and red if it is a Critical violation.
In many cases the default health rules may be the only health rules that you need. If the conditions are not configured appropriately for your application, you can edit them. You can also disable the default health rules.
AppDynamics recommends the following process to set up health rules for your application:
To view current health rules, including the default health rules, and to access the health rule wizard, click Alert & Respond > Health Rules. Then choose the type of entity for which you want health rules from the pulldown menu at the top.
Current health rules are listed in the left panel. If you click one of these rules, a list appears in the right panel showing which entities this selected health rule affects and what the status of the latest evaluation is. You can also select the Evaluation Events tab to see a detailed list of evaluation events.
In the left panel you can directly delete or duplicate a health rule. From here you can also access the health rule wizard to add a new rule or edit an existing one.
You can turn off evaluation of all health rules in the selected entity by clearing the Evaluate Health Rules check box. Check it when you want health rule evaluation to start again.
See Configure Health Rules for details on using the health rule wizard.
Across the UI, health rule status is color-coded:
If you see a health rule violation reported in the UI, you can click it to get more information about the violation.
Here are the health summary bars on the built-in dashboards:
A health column is displayed in various lists, such as the tier list below:
In the Events panel on the dashboards, health rule violations are in the Events panel on the dashboards.
For full-screen viewing, click Health rules.