Alerts let you know when problems exist and help you anticipate problems that might be developing. Responses let you automate preventative actions to address those problems before they cause a severe slowdown or outage. Think of alert and respond as the automation of your runbooks.
The alert and respond system is made up of three parts:
- Health rules: define key performance metric thresholds for your application, across the stack.
- Policies: link health rule violations, and other performance-based events, with appropriate actions.
- Actions: automate what should be done in a wide variety of situations, including sending alerts and performing diagnostic and remedial tasks.
Sample Use Cases
The AppDynamics platform recognizes some broad-based health issues commonly experienced by applications, such as "Business Transaction response time is much higher than normal" or "Memory utilization is too high". These are configured as default health rules, which define how high is "much higher than normal" or "too high". Use policies to attach these rules to alerts (whom to notify) and responses (what to do) when these problems exist. You can use these rules "as is" or modify them for your environment. See Default Health Rules.
In addition to the broad-based rules, you can customize precise automatic alerts and responses for narrowly circumscribed situations. This lets you fine-tune your system, ensuring that the right alert goes to the right person, the right action is taken for the right problem on the right cluster or server.
- You do not want to alert your team if performance in a few clusters is lagging, but if more than 20% of the clusters are unhealthy, or if servers in particular clusters or servers that meet certain criteria are performing poorly, you do want to trigger an alert. You can define health rules that apply to specific tiers or nodes. If these rules violate the system knows exactly which entity is experiencing problems and therefore whom to alert. This rule affects only one node: the order processing server.
- Performance is deteriorating in one business transaction so you want to view snapshots for that one transaction. You create a diagnostic action.
- You want to send an alert whenever an app agent stops reporting to the Controller. Create a node health rule based on the value of the Availability metric reported by the agent. If Availability is less than 1, the agent is not reporting.
- You have a large operation with several development teams, each responsible for a different service. You create a health rule for one service and then copy it. Then you create different policies in which you can pair each copy of the health rule to an alert addressed to the appropriate team.
- You have an application that performs well for normal load. However, peak loads can cause the application to slow. During peak load, AppDynamics not only detects the connection pool contention, but also allows you to create a remediation script that can automate increasing or decreasing the size of connection pool. You can require human approval to run this script or simply configure it to execute automatically when it is triggered. Create a runbook and associate it with a policy so that it will fire when the connection pool is exhausted.
Scope and Access to Alert and Respond Features
Typically different types of users with different types of roles set up and use different alert and respond features.
Email templates, HTTP request templates, and Email/SMS configuration are account-level features. The scope of these features, once set up, is the entire AppDynamics account. The items created at the account level are available to all the applications in that account. Account-level items are created and managed by users who have account-level roles that include permissions to create them.
By default these roles are account owner and administrator, although custom roles could be created that include some of these permissions. For example, an account owner or administrator could create an email template manager role that could be assigned to other users to give them the ability to create and modify email templates.
TBD? In addition, custom roles could be configured with view template permission. For example, an account manager could create an HTTP request template user role that allows users to view templates without being able to modify them. This would enable power users creating HTTP actions to examine the contents of the templates that might be appropriate to their needs.
Policies, health rules, actions and email digests are application-level features. The scope of these features is the application in which they were created. Only roles with application-level permissions are required to create and manage these items.