Controller High Availability

A High Availability (HA) Controller deployment helps you minimize the disruption caused by a server or network failure, administrative downtime, or other interruptions. An HA deployment is made up of two Controllers, one in the role of the primary and the other as the secondary.

The Enterprise Console automates the configuration and administration tasks associated with a highly available deployment on Linux systems. Controller HA pairs are not available on Windows Enterprise Console machines.

Essentially, to set up high availability for Controllers, you are configuring master-master replication between the MySQL instances on the primary and secondary Controllers.

An important operational point to note is that while the databases for both Controllers should be running, both Controller application servers should never be active (i.e., running and accessible by the network) at the same time. Similarly, the traffic distribution policy you configure at the load balancer for the Controller pair should only send traffic to one of the Controllers at a time (i.e., do not use round-robin or similar routing distribution policy at the load balancer).

The Controller supports encrypted database replication.

Overview of High Availability

Deploying Controllers in an HA arrangement provides significant benefits. It allows you to minimize the downtime in the event of a server failure and take the primary Controller down for maintenance with minimal disruption. It fulfills requirements for backing up the Controller data since the secondary maintains an updated copy of the Controller data. The secondary can also be used to perform certain resource-intensive operations that are not advised to be performed on a live Controller, such as performing a cold backup of the data or accessing the database to perform long-running queries, say for troubleshooting or custom reporting purposes.

In HA mode, each Controller has its own MySQL database with a full set of the data generated by the Controller. The primary Controller has the master MySQL database, which replicates data to the secondary Controller's replica MySQL database. HA mode uses a MySQL Master-Master replication type of configuration. The individual machines in the Controller HA pair need to have an equivalent amount of disk space.

The following figure shows the deployment of an HA pair at a high level. In this scenario, the agents connect to the primary Controller through a proxy load balancer. The Controllers in an HA pair must be equivalent versions, and be in the same data center.

In the diagram, the MySQL instances are connected via a dedicated link for purposes of data replication. This is an optional but recommended measure for high volume environments. It should be a high capacity link and ideally a direct connection, without an intervening reverse proxy or firewall. See Load Balancer Requirements and Considerations on Set Up a High Availability Deployment for more information on the deployment environment.

Operating Considerations

In a high availability deployment, it is important that only one Controller is the active Controller at one time. Only the database processes should be running on the secondary to that it can maintain a replicated copy of the primary database.

The Controller app server process on the HA secondary can remain off until needed. Having two active primary Controllers is likely to lead to data inconsistency between the HA pair.

When a failover occurs, the secondary app server must be started or restarted (if it is already running, which clears the cache).

To benefit from increased replication setup speeds, your server will need access to network resources capable of some hundreds of MB per second. By specifying replication setup parallelism, you can radically reduce setup times.

For example, if a single rsync is using only one-fifth of the available network capacity, you can achieve maximum throughput for setup by appending -P r5 to end of the replicate.sh command. If this level of network traffic interferes with the ongoing Controller operation, you should monitor and adjust this setting.

If you are using HA Toolkit version 3.54 and later, append -P r5 to end of the replicate.sh command
If you are using the HA module with Enterprise Console (version 4.5.17 and later), you must add the --args numberThreadForRsync=5 to the CLI
From the Enterprise Console UI, select Number of parallel rsync threads for incremental or finalize (depending on what stage you are performing)

Connecting Agents to Controllers in an HA Scenario

Under normal conditions, the App Agents and Machine Agents communicate with the primary Controller. If the primary Controller becomes unavailable, the agents need to communicate with the secondary Controller instead.

AppDynamics recommends that traffic routing be handled by a reverse proxy between the agents and Controllers, as shown in the figure above. This removes the necessity of changing agent configurations in the event of a failover or the delay imposed by using DNS mechanisms to switch the traffic at the agent.

If using a proxy, set the value of the Controller host connection in the agent configuration to the virtual IP or virtual hostname for the Controller at the proxy, as in the following example of the setting for the Java Agent in the controller-info.xml file:

<controller-host>controller.company.com</controller-host>

For the .NET Agent, set the Controller high availability attribute to true in config.xml. See .NET Agent Configuration Properties.

If you set up automation for the routing rules at the proxy, the proxy can monitor the Controller at the following address:

http://<controller>:<port>/controller/rest/serverstatus

An active node returns an HTTP 200 response to GET requests to this URL, with <available>true</available> in the response body. A passive node returns 503, Service Unavailable, with a body of <available>false</available>.

For more information, see Use a Reverse Proxy.