Policies let you anticipate problems and take actions to address those problems before they cause a severe slowdown or outage.
Policies provide a mechanism for automating monitoring and problem remediation. Instead of continually scanning metrics and events for the many conditions that could suggest problems, you can proactively define the events that are of greatest concern in keeping your applications running smoothly and then create policies that specify actions to start automatically when those events occur.
A policy connects two things:
- evaluated event triggers
- actions to be taken in response to those triggers
Policy triggers are events that cause the policy to fire. The events can be health-rule violation events or other types of events, such as hitting a slow transaction threshold or surpassing a resource pool limit. See Health Rules, Troubleshoot Health Rule Violations and Monitor Events.
The triggering events can be broadly defined as affecting any object in the application or very narrowly defined as affecting only specific objects. You can create a policy that fires when an event involving all the tiers in the application occurs, or one involving only specific tiers. You can create a policy that fires on events affecting only certain nodes, or only certain business transactions or certain errors. You can tune policies very specifically for different entities and situations.
For example, this very broadly-defined policy would fire whenever a resource pool limit (> 80% usage of EJB pools, connection pools, and/or thread pools) is reached for any object in the application.
On the other hand, this narrowly-defined JVMViolationInWebTier policy fires only when existing health rules on memory utilization or JVM garbage collection time are violated.
Here the triggering events for this policy are configured:
and here the affected object is limited to a specific tier - the ECommerce Server.
A policy is triggered when at least one of the specified triggering events occurs on at least one of the specified objects.
The second part of creating a policy is assigning one or more actions to be automatically taken in response to the policy trigger.
For example, for the resource pool violation, you want to take a thread dump and then run a script to increase the pool size.
Other common actions include restarting an application server if it crashes, purging a message queue that is blocked, or triggering the collection of transaction snapshots. You can also trigger a custom action to invoke third party systems. See Build an Alerting Extension for information about custom actions.
See Actions for more information about the different types of actions.
See Actions Limits for information about limits on the number of actions that the Controller will process.
Because the definition of health rules is separate from the definition of actions, and both health rules and actions can be very precisely defined, you can take different actions for breaching the same thresholds based on context, for example, which tier or node the violation occurred in.
Policy Actions in Batch
You can configure a policy:
- To execute its actions immediately for every triggering event (the default)
For example, if in a two-second period a policy matched 100 events, it would start its actions 100 times as soon as each event occurred.
- To execute its actions once a minute for all the events that triggered over the past minute (batch option)
For example, if in a two-second period, a policy matched 100 events and then no triggering events occurred for the next 58 seconds, the policy would start each action just once. The context for the actions would be all 100 events.
Whether or not to batch the actions depends primarily on the type of action. For a notification action it probably doesn't make sense to send 100 emails or SMSs in two seconds. In this case, it makes sense to batch the actions with a summary of the last one minute's events.
However, if the actions are thread dumps, there is no reason to expect that all 100 events are on the same node. They might be on different nodes. For that kind of action, you would probably want the thread dump to happen for each event and also, not wait 58 more seconds to take the thread dump.
To access the list of policies in an application, select Alert & Respond ->Policies.
The policy list lists all the policies created for your application, with its triggers and actions taken. You can view and edit an action assigned to a specific policy by clicking the action in the policy list.