If a node has been out of contact with the Controller for a certain amount of time, the Controller marks the node as a historical node. The Controller suspends certain types of processing activities for the node, such as rule evaluation.
If the node resumes contact with the Controller before the node deletion period expires, the Controller restores it to an active state. Otherwise, it is permanently removed from the Controller and the node level data is no longer accessible in the UI. Tier and application level historical metric data for the node remain available after the node is deleted however.
By default, the Controller considers a node historical after about 20 days of inactivity and deletes the node after 30 days. For a highly dynamic application environment in which nodes are created and destroyed frequently, it usually makes sense to shorten the node activity timeout period. This allows recycled nodes to be treated as such in the Controller.
The node activity timeout period is determined by the node retention period or activity settings.
The names of historical nodes can be assigned to new nodes. Node name reuse is a Java Agent option that, when enabled, directs the Controller to reuse node names, so that data generated by multiple, short-lived nodes in a given tier is associated with a single logical node.
Node Activity and Agent Licensing
For licensing purposes, the Controller releases the license for the agent if the Controller has not received data from the agent in the previous 5 minutes. This license availability behavior is not affected by the historical node status or node deletion timeout settings.
Configuring Node Activity Settings
The node activity settings are account level settings that the root AppDynamics administrator can modify from the administration console:
node.permanent.deletion.period: Time (in hours) after which a node that has lost contact with the Controller is deleted permanently from the system. The data is removed. If the agent starts reporting again after this period, it will start like a new node. Therefore, no historical data will be available at the node level. You will see historical data at the tier and app level, and cluster roll up will take place as normal.
The default is 720 hours, the minimum value is 6 hours, and the maximum value for this setting is unlimited.
node.retention.period: Time (in hours) after which a node that has lost contact with the Controller is deleted. In this case, the AppDynamics UI will not display the node, however, the system will continue to retain it. If the agent starts reporting again within these hours, it will reappear in the UI and the counter will reset. The data is persisted.
The default is 500 hours, the minimum value is 1 hour, and the maximum value for this setting is unlimited.
Additional notes about the node retention period:
- A node will not be impacted by the node retention period if the Machine Agent is associated with that node.
If you need a node to be considered for the node retention period, it should be marked as historical on shutdown:
Agent Behavior When Disconnected from the Controller
The Controller may become unreachable when there are network problems, agent errors or when the Controller server is down for a variety of reasons.
If the Controller is unreachable for one minute:
- The agent goes into standby mode during which it does not detect any transactions.
- Any collected snapshots and events are dropped and lost. Snapshots and events are dropped because they consume too much memory to cache.
- All metrics that have not been posted to the Controller are stored in memory. The memory impact of retaining metrics is minimal.
- New business transaction registrations that have not been posted to the Controller are stored in memory.
- The agent attempts to connect to the Controller every minute and resumes normal activity when it can download its full configuration.
If the Controller becomes reachable in the following minute or two:
- All metrics that have been stored in memory are posted to the Controller.
- New business transaction registrations that have been stored in memory are posted to the Controller.
- Snapshots and events collected in the 20 seconds prior to the reconnection are posted to the Controller.
If the Controller is not reachable after three failed attempts that are one minute apart:
- The agent is muted and all business transaction interceptors are disabled. The interceptors are still called when monitored application entry point methods are executed, but they are unproductive. No new business transactions are discovered or registered. Correlation exit points will set a header such as “notxdetect=true”, which tells downstream tiers to also ignore the transaction.
- JMX metrics are stored in the application server memory and transmitted to Controller after reconnection; so, there are no gaps in the metric history.
- Periodic metrics for the last three minutes are stored in memory. Metrics older than three minutes are purged from memory.
- The agent configuration channel and the metric channel continue to attempt to connect to the Controller once each minute.
If the Controller is not reachable after five minutes, the license is freed for another agent to use.
If the connection is later successful and the agent is able to download its full configuration and a license:
- All periodic metrics, such as JMX metrics and Windows performance counters for the last three minutes, are posted to the Controller. The Controller drops metrics that were collected too long ago in the past, such as when rollups are already completed.
- The agent is reactivated, business transaction interceptors are re-enabled, business transactions are monitored and possibly snapshotted, new business transactions will be discovered and registered, and downstream correlation is re-enabled.