Alerts let you know when problems exist and help you anticipate problems that might be developing. Responses let you automate preventative actions to address those problems before they cause a severe slowdown or outage. Think of alert and respond as the automation of your runbooks.
The alert and respond system is made up of four parts:
- Health rules: define key performance metric thresholds for your application, across the stack.
- Policies: link health rule violations and other performance-based events with appropriate actions.
- Actions: automate what should be done in response to a wide variety of events, such as sending alerts and performing diagnostic and remedial tasks. See Alert and Respond API to learn how to create custom URLS for notifications.
- Email digests: send a compilation of messages sent to a recipient list when specified events occur.
You can create email templates and HTTP request templates to support email and HTTP actions. These templates can be re-used to create actions for various applications in the account as well as facilitating integration of the alert and respond system with third-party email and HTTP APIs.
Notification actions that use email or SMS and email digests require that the SMTP server be configured for the controller. See Configure the Email Server.
Sample Use Cases
The AppDynamics platform recognizes some broad-based health issues commonly experienced by applications, such as "Business Transaction response time is much higher than normal" or "Memory utilization is too high". These are configured as default health rules, which define how high is "much higher than normal" or "too high". Create policies to attach these rules to alerts (whom to notify) and responses (what to do) when these problems exist. You can use these rules "as is" or modify them for your environment. See Default Health Rules.
In addition to the broad-based rules, you can customize precise automatic alerts and responses for narrowly circumscribed situations. This lets you fine-tune your system, ensuring that the right alert goes to the right person, the right action is taken for the right problem on the right cluster or server.
Here are just a few examples.
Apply Health Rules to One Node.
You do not want to alert your team if performance in a few clusters is lagging, but if more than 20% of the clusters are unhealthy, or if servers in particular clusters or servers that meet certain criteria are performing poorly, you do want to trigger an alert. You can define health rules that apply to specific tiers or nodes. If these rules are violated, the system knows exactly which entity is experiencing problems and therefore whom to alert. The following rule affects only one node: the order processing server.
Start a diagnostic action for one business transaction.
Performance is deteriorating in one business transaction so you want to collect snapshots for that one transaction.
Alert when an app agent stops reporting to the Controller.
Create a node health rule based on the value of the Availability metric reported by the agent. If Availability is less than one, the agent is not reporting.
Alert when the 95th percentile metrics for specific business transactions reach a certain value.
You want to apply this rule only to business transactions with names beginning with "User".
And you do not want to create a separate health rule for "User" every business transaction. Instead of specifying a simple metric from the metric tree, you specify a relative metric path. The health rule is evaluated for each of the affected business transactions. Use a relative metric path when you need to evaluate a single metric for multiple entities.
You have a large operation with several development teams, each responsible for a different service.
You create a health rule for one service and then copy it. Then create different policies in which you can pair each copy of the health rule to an alert addressed to the appropriate team.
Start a script to change the size of the connection pool.
You have an application that performs well for normal load. However, peak loads can cause the application to slow. During peak load, the AppDynamics not only detects the connection pool contention, but also allows you to create a remediation script that can automate increasing or decreasing the size of connection pool. You can require human approval to run this script or simply configure it to execute automatically when it is triggered. Create a runbook and associate it with a policy so that it will fire when the connection pool is exhausted.
Alert when available disk volume is low.
Use an expression over two metrics - available and used disk space - to be alerted when disk volume is low.
Products that Alert and Respond
Policies, health rules, actions, and email digests can be created for databases, analytics, and EUM as well as for applications. Where not otherwise qualified, this documentation describes the features as they are applied to instrumented applications because these use cases offer the richest set of features and choices for configuration. Alert and Respond features for other AppDynamics products are more limited.
The policy triggers for applications can be health rule violation events or a variety of other types of events. The policy triggers for databases and analytics must be health rule violation events,
The types of actions that you can create for an application include notifications, diagnostics, remediation, HTTP requests, custom actions and cloud auto-scaling. The types of actions that you can create for a database or analytics are limited to notifications, HTTP requests and custom actions.
The types of entities affected by a health rule are more limited for databases and analytics than for applications.
For information on using polices triggered by browser synthetic events see Alerting and Synthetics in Browser Synthetic Monitoring.
Scope and Access
Typically different types of users with different types of roles set up and use different alert and respond features.
Email templates, HTTP request templates, and Email/SMS configuration are account-level features. The scope of these features, once set up, is the entire AppDynamics account. The items created at the account level are available to all the applications in that account. Account-level items are created and managed by users who have account-level roles that include permissions to create them.
By default these roles belong to the account owner and could be granted to an account administrator. Custom roles could also be created that include some of these permissions. For example, an account owner could create an email template manager role that could be assigned to other users to give them the ability to create and modify email templates.
Policies, health rules, actions and email digests are application-level or tier-level features. The scope of these features is the application or tier in which they were created. Only roles with application-level or tier-level permissions are required to create and manage these items.
See the Application- and Tier-Level Permissions section in Roles and Permissions for details about these permissions.
In addition to specific actions that are triggered by specific events, you can create an email digest that reports a summary of specific events to a recipient list on a schedule.
This is a sample of an email digest that is sent every hour:
To create an email digest, click Alert & Respond in the menu bar, then Email Digests either in the right panel or the left navigation pane. Follow the steps in the wizard to create the digest.
Watch a video
Click this link to see a full screen version of the video, Alert and Respond: Quick Tour.