On this page:

Related pages:

Your Rating:
Results:
PatheticBadOKGoodOutstanding!
16 rates

This topic describes the detailed steps for configuring health rules using the health rule wizard. For more information on these settings, see Health Rules.

Permissions

To create, edit, or delete health rules, you need the Configure Health Rules permission. For more information, refer to Application Permissions.

Structure of the Health Rule Wizard

The health rule wizard contains four panels:

  • Overview: Sets the health rule name, enabled status, the health rule schedule, evaluation period of the health rule data, and wait time post violation.
  • Affected Entities: Sets the entities evaluated by the health rule. The options presented vary according to the health rule type you have defined.
  • Critical Criteria: Sets the conditions, whether all or any of the conditions need to be true for a health rule violation to exist, and the evaluation scope—business transaction and node health policies defined at the tier level only—it also includes an expression builder to create complex expressions containing multiple metrics.
  • Warning Criteria: Settings are identical to Critical Criteria, but configured separately.

You can navigate among these panels using the Back and Next buttons at the bottom of each panel or by clicking the panels in the wizard. You should configure the panels consecutively because the configuration of the health rule type determines the available affected entities in the Affected Entities panel as well as the available metrics in the Criteria panels.

Create a Health Rule

The Health Rule wizard groups commonly-used system entities and related metrics to help you set up rules. You can use the default health rules provided by AppDynamics as-is or modify them to map to your requirements or define a custom health rule.

Access the Health Rule Wizard

  1. Click Alert & Respond in the menu bar.
  2. Click Health Rules either in the right panel or the left navigation pane.
  3. Select the context for the health rule from the pulldown menu.
  4. Do one of the following:
    • To create a new health rule, click the + icon.
    • To edit an existing health rule, select the health rule and click the Edit (pencil) icon.
    • To remove an existing health rule, select the health rule and click the Delete (-) icon.

Configure Generic Heath Rule Settings

You configure generic settings in the Overview panel.

  1. Enter a name. If a name already exists, you can change it.
  2. Check Enabled to enable the rule, clear the checkbox to disable it.
  3. The Always option is pre-selected in the When is the rule enabled? drop-down list. If the health rule is enabled only at certain times, select other predefined schedules from the When is the rule enabled? drop-down list. 

    To define a custom health rule schedule or modify the predefined time intervals, click Manage Health Rule Schedules.  See Create and Manage Health Rule Schedules.

  4. Click the drop-down list Use data from last <> min(s) and select a number between 1 and 360 minutes. The value you specify is the latest time interval during which data is collected to determine if there is a health rule violation. This value applies to both critical and warning criteria. See Health Rule Evaluation Window.

  5.  In the Wait Time after Violation field, enter the number of minutes to wait before re-evaluating the rule for the same affected entity in which the violation occurred. See Health Rule Wait Time After Violation.

  6. Save your configuration.

Create and Manage Health Rule Schedules

  1. In the Overview panel of the Create Health Rule wizard, click Manage Health Rule Schedules. The Manage Health Rule Schedules window lists all the predefined time intervals.
  2. To create a new health rule schedule:
    1. Click the + icon. The Create New Policy Schedule window is displayed.
    2. Enter a name for the schedule.
    3. Enter an optional description of the schedule.
    4. Enter the start and end times for the schedule as cron expressions. For example, the following custom schedule specifies a start time value of 0 0 13 ? * 2-6 and end time of 0 0 15 ? * 2-6, directing the health rule to be evaluated from 1 pm to 3 pm, Monday through Friday:
       
    For additional examples, you can select a predefined schedule in the Manage Health Rule Schedules window and click the Edit icon to see the cron expression for the predefined schedule.
    The Controller cron expressions are evaluated in PDT for SaaS controllers, and their format is based on Quartz Scheduler cron expressions. For on-premises controllers, cron expressions are evaluated according to controller time zone. For more information, see Quartz Scheduler documentation.
  3. To edit a predefined schedule for health rule evaluation:
    1. Select the schedule and click the Edit icon.
    2. In the Edit Policy Schedule window, make necessary changes.
    3. Click OK to save your changes.
  4. Save your configuration.

To delete a health rule evaluation schedule, select the schedule in the Manage Health Rule Schedules window and click Delete—the minus icon at the top. Click OK to confirm the deletion.

Configure Affected Entities

The Affected entities panel lets you define what entities your health rule affects. The health rule type you select determines the metrics that are offered for configuration in subsequent panels of the wizard. To define the affected entities: 

  1. Select a health rule type from the drop-down list. Depending on the type of the health rule, you can configure the corresponding entities that are affected. See Entities Affected by a Health Rule for information about the types of entities that can be affected by the various health rule types.
  2. Use the drop-down list to select the entities affected by this health rule.
  3. If you select entities based on matching criteria, specify the matching criteria.
    For example, if you select the Tier/Node Health - Transaction Performance as the health rule type, and if the health rule affects the nodes, you can restrict the health rule evaluation on the types of nodes or criteria such as meta-info, environment variables, and JVM system environment properties. Meta-info includes key-value pairs for:
    • key: supportsDevMode
    • key: ProcessID
    • key: appdynamics.ip.addresses
    • any key passed to the agent in the appdynamics.agent.node.metainfo system property

Configure Health Rule Evaluation Criteria

After configuring the entities affected by the health rule, you must define the evaluation criteria. The high-level process for configuring the criteria is:

  1. Determine the number and kind of metrics the health rule should evaluate. For each performance metric you want to use, create a condition using either one of the following methods:
    • Use a single condition component or multiple condition components for a single condition state. 
    • Use values based on complex mathematical expressions.
  2. If you have defined multiple conditions, decide whether the health rule violates if all of the tests are true or if any single test is true.
  3. For business transaction performance health rules and node health rule types that specify affected entities at the tier level, decide how many of the nodes must be violating the health rule to produce a violation event. See Health Rule Evaluation Scope.
  4. To configure critical conditions use the Critical Criteria panel. To configure a warning condition use the Warning Criteria panel.

Though the configuration processes for critical and warning conditions are identical, critical conditions are evaluated before warning conditions. If you have defined a critical condition and a warning condition in the same health rule, the warning condition is only evaluated if the critical condition is not true.

You can copy the settings between Critical and Warning condition panels and edit the fields you desire. For example, if you have already defined a critical condition and you want to create a warning condition that is similar, in the Warning Condition window click Copy from Critical Condition to populate the fields with settings from the Critical condition.

Create a Condition

  1. In the Critical Condition or Warning Condition window, click + Add Condition to add a new condition component.
    The row defining the component opens. See To Configure a Condition Component. Continue to add components to the condition as needed.
  2. From the drop-down list above the components, select All if all of the components must evaluate to true to constitute rule violation. Select Any if a health rule violation exists if any single component is true.
  3. For health rules based on the following health rule types:
    • Business transaction 
    • Node health-hardware
    • Node health-transaction performance 

    you must specify the evaluation scope:

    Evaluating Serverless Tiers

    When you monitor serverless entities comprising tiers for AWS Lambda, the health rules are evaluated as described below.

    Health Rule TypeCondition Evaluation CriteriaAffected EntitiesEvaluation
    • Business transactions
    • Service end points
    • Error rates
    The BT Averageserverless tier(s)Metrics are aggregated at the tier level.
    • Any node
    • % of the Nodes
    • Number of the Nodes
    serverless tier(s)Metrics for serverless tiers are aggregated at the tier level, while the metrics for other tiers are evaluated as per the defined criteria.
    Tier/Node Health (Transaction Performance)
    • The Tier Average (average for all Nodes in the Tier)
    • Any node
    • % of the Nodes
    • Number of the Nodes
    serverless tier(s)Metrics for serverless tiers are aggregated at the tier level, regardless of the evaluation criteria defined.
    • The Tier Average (average for all Nodes in the Tier)
    • Any node
    • % of the Nodes
    • Number of the Nodes
    serverless node(s)The performance of serverless tiers is not evaluated for Tier/Node Health (Hardware) health rules. AWS does not offer node-level dashboards or metrics because the serverless platform runtime instances spin up and down on demand.
    Tier/Node Health (Hardware)-
    • serverless tier(s)
    • serverless node(s)
    The performance of serverless tiers is not evaluated for Tier/Node Health (Hardware) health rules. AWS does not offer node-level dashboards or metrics because the serverless platform runtime instances spin up and down on demand.
  4.  If the Health Rule will violate if the conditions above evaluate to true section is visible, select the appropriate radio button to set the evaluation scope.

    If you select percentage of nodes, enter the percentage. If you select the number of nodes, enter the absolute number.

Configure a Condition

  1. In the first field of the condition row, enter a name for the condition.
    This name is used in the generated notification text and in the AppDynamics console to identify the violation.
  2. To select the metric on which the condition is based, do one of the following:
    • To specify a simple metric, click Select a Metric to open a Metric Selection window.
    • The metric browser in the Metric Selection window displays metrics appropriate to the health rule type. Select the metric to monitor and click Select Metric.
      or
    • To build an expression using multiple metric values, select Metric Expression from the drop-down list and click Add Expression.
      Choosing Add Expression opens the Metrics Expression window where you can construct a mathematical expression to use as the metric. For information on constructing mathematical expressions, see To Build an Expression.
  3. From the drop-down list after the metric, select the type of comparison to evaluate the metric.

    • To limit the effect of the health rule to conditions during which the metric is within a defined distance—standard deviations or percentages—from the baseline, select Within Baseline from the menu. To limit the effect of the health rule to when the metric is not within that defined distance, select Not Within Baseline. Then select the baseline to use, the numeric qualifier of the unit of evaluation and the unit of evaluation. For example:

      Within Baseline of the Default Baseline by 3 Baseline Standard Deviations
    • To compare the metric with a static literal value, select < Specific value or > Specific Value from the menu, then enter the specific value in the text field. For example:

      Value of Errors per Minute > 100
    • To compare the metric with a baseline, select < Baseline or > Baseline from the drop-down list, and then select the baseline to use, the numeric qualifier of the unit of evaluation and the unit of evaluation. You can use the Baseline Standard Deviation or Baseline Percentage as the unit of evaluation. For example:

      Maximum of Average Response Time is > Baseline of the Daily Trend by 3 Baseline Standard Deviations 


      See Dynamic Baselines for information about the baseline options.

      Baseline Percentages

      The baseline percentage is the percentage above or below the established baseline at which the condition will trigger. For example, if you have a baseline value of 850 and you have defined a baseline percentage of > 1%, the condition is true if the value is > [850+(850x0.01)] or 859.  

      To prevent health rule violations from being triggered when the sample sets are too small, these rules are not evaluated if the load—the number of times the value has been measured—is less than 1000. For example, if a very brief time slice is specified, the rule may not violate even if the conditions are met, because the load is not large enough.

  4. The Evaluate to true on no data option controls the evaluation of the condition in cases where any metric on which the condition is based on returns no data. The default when no data is returned is for the condition to evaluate to unknown. If the health rule is based on all the conditions evaluating to true, having no data returned may affect whether the health rule triggers an action.

    If you want the condition to evaluate to true whenever a metric on which the condition depends returns no data, check the Evaluate to true on no data option. Note however that this option does not affect the evaluation of unknown in the case where there is not yet enough data for the rule to evaluate. For example, if the health rule is configured to evaluate the last 30 minutes of data and a new node is added, the condition will evaluate to unknown for the first 30 minutes even if the Evaluate to true on no data box is checked. 

    If you want all of the conditions to evaluate to true, you can check Evaluate all as true on no data instead of specifying the option for each condition separately. If you check this option and then add more conditions, new conditions will not be effected automatically. To apply this option to the added conditions, uncheck and then check the Evaluate all as true on no data check box again when you are finished adding conditions.

  5. Click Save when done. 

 

Using Health Rule Conditions to evaluate agent availability metrics can result in false positives. For example:

  • Agents may not be connecting with controllers due to communication errors for a couple of minutes.
  • Data may be delayed for a couple of minutes due to latency issues.

You can avoid occasional one-to-two minute metric loss due to network issues or late arrival by configuring your Health Rule as follows:
  1. Select Nodes for what the Health Rule affects. Tiers can be set, but more often we recommend setting Nodes.
  2. Select Node Health - Hardware, JVM, CLR as the Type.
  3. Use the last five minutes, with a wait time of ten minutes.
  4. Set your condition to be the Sum of < Specific Value of three.

This configuration will generate a violation when the agent is down for more than two minutes during the last five minutes.

 

Remove a condition

Remove a component condition by clicking the delete icon.

Build an expression

To access the expression builder to create a complex expression as the basis of a condition, select the Metric Expression option from the drop-down list and click Add Expression. The Metric Expression window is displayed.

For example, the following expression is created to measure the percent of slow business transactions. See the screenshot that follows for the UI location where each step is performed. 

  1. In Variable Declaration pane of the Mathematical Expression builder, click + Add variable to add a variable.
  2. In the Variable Name field enter a name for the variable.
  3. From the drop-down list, select the qualifier for the metric from the following options:

    Qualifier Type

    Description

    Minimum

    The minimum value reported across the configured evaluation time length. Not all metrics have this type.

    Maximum

    The maximum value reported across the configured evaluation time length.  Not all metrics have this type.

    Value

    The arithmetic average of all metric values reported across the configured evaluation time length. This value is based on the type of the metric.

    Sum

    The sum of all the metric values reported across the configured evaluation time length.

    Count

    The number of times the metric value has been measured across the configured evaluation time length.

    Group CountThe number of nodes contributing to a metric value, generally relevant for application or tier level metrics.

    Current

    The value for the current minute.

  4. Click Select a metric to open an embedded metric browser.

    Health Rule Evaluation Condition

    A health rule is not evaluated if any metric in the expression has a null value. This is to avoid erroneous evaluations as shown in the following examples.

    ExpressionNull ValueEvaluation
    a-b-c
    a
    entire expression is evaluated negative
    a/bb
    the number 'a' is to be divided by zero, evaluates to an error
    a*b
    a or b
    entire expression is evaluated as zero
  5. Repeat steps 1 through 4 for each metric that you will use in the expression.
    You can remove a variable by clicking the delete icon.
  6. In Expression pane, build the expression by clicking Insert Variable to insert variables created in the Variable Declaration pane along with appropriate mathematical signs.
  7. When the expression is built, click Save.

Custom Metrics in Multiple Entities

To specify Hardware Resources, JVM, and CLR metrics in multiple entities using a wildcard, you can use the procedure described in Using Wildcards in Metric Definitions.

To create a health rule on a custom metric in a single business transaction, node, or overall application performance, you specify the health rule type as custom and when you configure the condition component, in the Select Metric window choose to Specify a Metric from the Metric Tree and select the metric from the embedded metric browser.

A different use case is to create a rule that evaluates a custom metric that exists across various entities, for example across several nodes. You want to do this with one health rule; you do not want to create a separate health rule for each node. In this case, you need to specify the custom metric using the relative metric path to the metric instead of selecting the metric from the embedded metric browser.

First get the relative path to the metric and then configure the health rule using that relative path.

To get the relative metric path for a multi-entity metric:
  1. Navigate to the Metric Browser by selecting Metric Browser in the left navigation pane.
  2. Select the metric that you want to use for the condition.
  3. Right-click and select Copy Full Path.
  4. Save this value in a file from which you can copy it later.

The following example gets the metric path for the CPU %Busy metric for the Inventory Server tier. The CPU %Busy metric would be appropriate to use in a health rule that affects all the nodes in that tier.

To configure a health rule that evaluates the custom metric over multiple entities:
  1. In the Overview panel of health rule wizard choose the health rule type for the kind of entity that you are monitoring.
  2. In the Affected Entities panel select the effected entity.
  3. When you create the condition component that uses the metric, in the Select Metric window choose Specify a Relative Path Metric.
  4. Crop the relative metric path that you saved from the metric browser by doing one of the following:
    • For all health rule types except Node Health-Hardware, JVM, CLR or Custom, crop the path to use the metric name alone - for example, Average Wait Time (ms)) 
    • For Node Heath-Hardware, JVM, CLR and Custom health rule types, crop the path to use everything after the entity, for example, after the Node name. In the example below, the cropped path would look like this.
  5. Paste the cropped relative metric path in the relative metric path field of the Select Metric window.
  6. Click Select Metric.

Additional Considerations

  • Your application status is based on health rules for the current time range. If you disable old health rule policies, or enable new ones, you might see errors in red in your application status, even if there are no current critical events based on the new policies. To verify that your new or disabled health rule policies have taken effect, change the time range in your dashboard to a smaller, more recent time frame.

  • When you are configuring health rules for business transactions with baseline selected in the configured condition with a very fast average response time (ART) such as 25 ms, using standard deviation as a criterion can cause the health rule to be violated too frequently. The health rule may violate too frequently because a tiny increase in response time can represent multiple standard deviations. In this case, consider adding a second condition that sets a minimum ART as a threshold. For example, if you do not want to be notified unless ART is over 50 ms, you could set your threshold as ART > 2 Standard Deviations and ART > 50 ms.
  • Similarly, when configuring health rules for calls-per-minute (CPM) metrics with baseline selected in the configured condition, the health rule may never be violated if the condition is using standard deviations, and the resulting value is below zero. In this case, consider adding a second condition that checks for a zero value, such as CPM < 2 Standard Deviations and CPM < 1.
  • No labels