Versions Compared

    Key

    • This line was added.
    • This line was removed.
    • Formatting was changed.
    Comment: Published by Scroll Versions from this space and version 20.5
    Sv translation
    languageen

    This topic page describes how to manage and troubleshoot Controllers as a high availability pair.

    Set Up Monitoring for the HA Pair

    You can set up monitoring for your HA pair by installing another Controller to act as the monitoring Controller. This provides the same administrative functionality as the HA toolkit used in 4.3 and earlier versions.

    1. If you do not already have an HA pair, set one up.
    2. Install the monitoring Controller on the Enterprise Console host in a new platform by selecting Custom Install:
      1. Create a platform (e.g.: Controller Monitor Platform).

        Note

        This platform should not be used for installing any other services.

      2. Install a Controller.
      3. Make sure to unselect the Install Events Service option before clicking Install.
    3. Complete the monitoring setup by installing and configuring the App Agents and Machine Agents on your HA pair:

    Anchor
    set-up-app-agents-for-monitoring
    set-up-app-agents-for-monitoring
    Set Up App Agents for Monitoring

    You can set up App Agents, which are automatically installed on the Controller hosts by the Enterprise Console, on both Controllers of an HA pair to report to the monitoring Controller. This can be done by updating the JVM options of your HA pair platform. To set up your App Agents using the Enterprise Console, perform the following steps:

    1. SSH into the primary Controller box and update the primary Controller App Agent's controller-info.xml by running the following commands:

      No Format
      cd <controller-install-dir>/appserver/glassfish/domains/domain1/appagent
      cp conf/controller-info.xml ver<version#>/conf/
    2. Repeat step 1 for the secondary Controller.
    3. In the Enterprise Console UI, select your HA pair platform, and navigate to the JVM Options section by clicking Configurations ,> Controller Settings, and  > Appserver Configurations.

    4. Make the following updates to JVM Options:

      1. Update the appdynamics.controller.hostName to the monitoring Controller's IP.

      2.  Add the following required jvmrequired jvm-options for options for monitoring:

        No Format
        -Dappdynamics.agent.applicationName=<app_name>, -Dappdynamics.agent.tierName=<tier_name>, 
        -Dappdynamics.agent.nodeName=<node_name>, -Dappdynamics.agent.accountName=<account_name>, 
        -Dappdynamics.agent.accountAccessKey=<access_key>
        Info

        You can get your access key from the Controller UI: navigate to Settings, > License, and > Account. Then click to show your access key. Note, when you log in to the Controller, use the account specified in appdynamics.agent.accountName.

    5. Scroll down the page and click Save. The job will apply these properties and restart both the primary and secondary Controllers.
    6. In the Enterprise Console UI, select your Controller Monitor Platform, and navigate to the Controller page.
    7. Click on External URL on the widget to open the monitoring Controller's UI.
    8. Log in to the Controller. You should be able to see the monitoring application for both the primary and secondary Controllers.

    Anchor
    install-and-set-up-machine-agents-for-monitoring
    install-and-set-up-machine-agents-for-monitoring
    Install and Set Up Machine Agents for Monitoring

    You must install Machine Agents on both Controllers of an HA pair to report to the monitoring Controller. These agents are Java programs that collect hardware metrics. To install and set up your machine agents, perform the following steps:

    1. Install the Standalone Machine Agent on the primary Controller box. Do not start the agent.

    2. Repeat step 1 for the secondary Controller.
    3. Configure the Machine Agent properties for both Machine Agents by editing the controller-info-xml file located in the <machine_agent_home>/conf directory.
      1. Update the <controller-host> to the monitoring Controller's IP.
      2.  Model the rest of your controller-info-xml file after the Example Configuration.

    4. Start both Machine Agents.
    5. In the Enterprise Console UI, select your Controller Monitor Platform, and navigate to the Controller page.
    6. Click on External URL on the widget to open the monitoring Controller's UI.
    7. Log in to the Controller. You should be able to see the monitoring application for both the primary and secondary Controllers.

    Bouncing the Primary Controller Without Triggering Failover

    The Enterprise Console does not allow you to stop and start the primary Controller without initiating failover. Therefore, to work around this, you will need to perform the following steps:

    1. Log in to the Enterprise Console and navigate to the Appserver Configurations page by clicking through Configurations, followed by Controller Settings.
    2. Deselect Enable Auto Failover and click Save.
    3. SSH to the Controller machine where the Controller is installed.
    4. Run the following commands on the Enterprise Console host:

      No Format
      bin/platform-admin.sh stop-controller-appserver
      bin/platform-admin.sh start-controller-appserver

      This will bounce the primary Controller in HA mode.

    5. Re-enable auto failover on the Enterprise Console Appserver Configurations page.

    Starting and Stopping the Controller

    The Enterprise Console does not allow you to shut down the primary Controller. However, you can restart the secondary Controller via the start and stop Controller commands.

    To start or stop the Controller manually, use the following commands: 

    • To start: 

      No Format
      bin/platform-admin.sh start-controller-appserver --with-db
    • To stop: 

      No Format
      bin/platform-admin.sh stop-controller-appserver --with-db

    Automatic Failover

    The Enterprise Console monitors the health of the primary Appserver and database. If the Appserver or database is unresponsive, the Enterprise Console will by default wait for five minutes before initiating a failover. This interval can be configured by updating the default value in the Domain Protocol text field on the Appserver Configurations page under Controller settings.

    You can also disable or enable automatic failover through the CLI.

    Info

    Auto failover is enabled OOB by default when managing HA deployments via the Enterprise Console. For auto-failover to work seamlessly, the Controller database MySQL root password needs to be stored securely in the Enterprise Console. If you do not want the Enterprise Console to store the database root password, you have to disable auto-failover, which is not recommended.

    To disable automatic failover, run the following command on the Enterprise Console host:

    No Format
    bin/platform-admin.sh submit-job --service=controller --job update-configs --platform-name <platform_of_the_platform> --args "enableAutoFailover=false"

    To enable automatic failover, run the following command on the Enterprise Console host:

    No Format
    bin/platform-admin.sh submit-job --service=controller --job update-configs --platform-name <platform_of_the_platform> --args "enableAutoFailover=true"

    Anchor
    manual-failover
    manual-failover
    Performing a Manual Failover and Failback

    To failover from the primary to the secondary manually, click the HA Failover option on the Controller page of the Enterprise Console or run the following command on the Enterprise Console host:

    No Format
    bin/platform-admin.sh submit-job --service controller --job ha-failover --platform-name <name_of_the_platform>

    This changes the Appserver on the secondary as primary and database on the secondary as the replication master. It also changes the old primary to secondary.

    The process for performing a failback to the old primary is the same as failing over to the secondary. Simply run the following command on the Enterprise Console host:

    No Format
    bin/platform-admin.sh submit-job --service controller --job ha-failover --platform-name <name_of_the_platform>

    Note that if it has been down for more than seven days, you need to revive the database, as described in the following section.

    Anchor
    incremental-replication
    incremental-replication

    Initiate Controller Database Incremental Replication

    Re-enable Broken Replication

    Incremental replication, replication via rsync when the primary database is up, is required in cases where the database replication on the secondary Controller is lagging behind the primary Controller by more than three days. This type of replication allows the primary Controller to keep operating while the disk contents are copied to the secondary node.

    To initiate incremental replication:

    1. Run the following command on the Enterprise Console host:

      No Format
      bin/platform-admin.sh submit-job --service controller --job incremental-replication

      This launches a continuously running background job.

    2. Make sure replication occurs four or more times, by checking mysqlDir/incremental_sync.status on the primary database host.
      Sample rsync status file output:

      No Format
      rsync started at Mon Mar  5 11:49:56 PST 2018
      rsync completed at Mon Mar  5 11:50:56 PST 2018
      rsync started at Mon Mar  5 11:51:01 PST 2018
      rsync completed at Mon Mar  5 11:51:11 PST 2018
      Info

      If replication fails, go to the secondary host and stop all rsync and ha-replicate.sh processes. Then try running the incremental-replication job again.

    3. Finalize the job by running the following command on the Enterprise Console host:

      No Format
      bin/platform-admin.sh submit-job --service controller --job finalize-replication

      This stops the incremental replication loop. The command will restart the primary Controller, resulting in downtime.

    4. Make sure replication is working by checking that there is no significant gap between the primary and secondary Controllers. You can run the following command on the Enterprise Console host to check the replication status:

      No Format
      bin/platform-admin.sh show-service-status --platform-name <platform_name> --service controller

      It may take a few minutes for the secondary status to catch up.

    Add a Secondary Controller Using Incremental Replication

    You can convert a single Controller with a large amount of data to an HA pair by using incremental replication. This way, you can rsync most of the Controller's data while the Controller is still running, limiting the downtime of adding a secondary Controller.

    To add a secondary Controller using incremental replication:

    1. Start the incremental replication, giving host and rsync parameters:

      No Format
      bin/platform-admin.sh submit-job --service controller --job incremental-replication --args controllerSecondaryHost=1.1.1.1 rsyncThrottle=40000 rsyncCompress=true

      This launches a continuously running background job.

    2. Make sure replication occurs four or more times, by checking mysqlDir/incremental_sync.status on the primary database host.
      Sample rsync status file output: 

      No Format
      rsync started at Mon Mar  5 11:49:56 PST 2018
      rsync completed at Mon Mar  5 11:50:56 PST 2018
      rsync started at Mon Mar  5 11:51:01 PST 2018
      rsync completed at Mon Mar  5 11:51:11 PST 2018
      Info

      If replication fails, go to the secondary host and stop all rsync and ha-replicate.sh processes. Then try running the incremental-replication job again.

    3. Run the add secondary job. The Enterprise Console will perform a final rsync and add the secondary.

      No Format
      bin/platform-admin.sh submit-job --service controller --job add-secondary --args controllerSecondaryHost=secondary mysqlRootPassword=‘password'

      The command will restart the primary Controller, resulting in downtime.

      Info

      Until you trigger the add-secondary command, the secondary Controller is not added to the Enterprise Console platform. Therefore, the Enterprise Console will not be able to perform any other operations on the secondary Controller.

    If you need to stop replication, you can run the following command:

    No Format
    bin/platform-admin.sh submit-job --service controller --job stop-incremental-replication

    Set Replication Factors for Rsync Threads

    Using the Enterprise Console UI or the CLI, you can set the number of parallel rsync threads as a job parameter when you perform incremental or finalize replication.

    • From the Enterprise Console UI:
      1. Log in to the Enterprise Console and access the Controller page.

      2. From the More menu, based on which replication you are performing, select either Incremental Replication or Finalize Replication.
        Increment or Finalize Replication

      3. Enter a number in the Number of parallel rsync threads field and select click Submit. The default value is 1.
        Number of parallel rsync threads

    • From the CLI, based on which replication you are performing, run either of the following commands from the Enterprise Console host and set the numberThreadForRsync argument.

      Code Block
      bin/platform-admin.sh submit-job --job incremental-replication --args numberThreadForRsync=<number> bin/platform-admin.sh submit-job --job finalize-replication --args numberThreadForRsync=<number>

    Enable MySQL5.7 Parallel Replication

    Using the Enterprise Console UI or the CLI, you can enable MySQL5.7 parallel replication when you perform finalize replication. 

    • From the Enterprise Console UI:
      1. Log in to the Enterprise Console and access the Controller page.

      2. From the More menu, select Finalize Replication.
        Finalize Replication

      3. Select the Database parallel replication check box to enable parallel replication with the MySQL7.5 database. 
        Database parallel replication checkbox
      4. Select Click Submit

    • From the CLI, run the following command from the Enterprise Console host to enable MySQL5.7 parallel replication. The default value is true. 

      Code Block
      bin/platform-admin.sh submit-job --job finalize-replication --args dbParallelReplication=true

    Troubleshooting the Incremental Replication Status

    If your first incremental replication run is taking longer than usual, you can refer to the status file, incremental_sync.status, to review a detailed list of files that are being rsynced. You can find the file in the primary Controller host under the Platform folder: mysqlDir/incremental_sync.status.

    Re-enable Controller Database Replication

    The Controller databases can be synchronized using the replicate script if they have been out of sync for more than seven days. Synchronizing a database that is more than seven days behind a master is considered reviving a Controller database. Reviving a database involves essentially the same procedure as adding a new secondary Controller to an existing production Controller, as described in Set Up the Secondary Controller and Initiate ReplicationYou can also follow these steps in the case of an HA failover that failed at replication.

    To re-enable replication or revive a Controller database:

    1. On the Controller page, click on click Remove Controller, or run the following command on the Enterprise Console host:

      No Format
      bin/platform-admin.sh submit-job --job remove --service controller
    2. Enter the database root credentials.
    3. Check Remove Binaries, or run the following command on the Enterprise Console host:

      No Format
      bin/platform-admin.sh submit-job --job remove --service controller --args removeBinaries=true
    4. Uncheck Remove Controller Cluster. If it is already unchecked, remove the secondary server.
    5. Click Submit.
    6. Add a secondary controller from the Controller page, or run the following command on the Enterprise Console host:

      No Format
      bin/platform-admin.sh submit-job --service controller --job add-secondary --args controllerSecondaryHost=secondary mysqlRootPassword=‘password'

      The command will restart the primary Controller, resulting in downtime.

    The Enterprise Console will onboard the secondary Controller and re-enable replication.

    Backing Up and Restoring Controller Data in an HA Pair 

    An HA deployment makes backing up Controller data relatively straightforward since the secondary Controller offers a complete set of production data on which you can perform a cold backup without disrupting the primary Controller service. 

    After setting up HA, perform a back up by stopping the Controller on the Enterprise Console and performing a file-level copy of the AppDynamics home directory (i.e., a cold backup). When finished, simply restart the Controller from the Enterprise Console. The secondary will then catch up its data to the primary.

    When restoring the database from a back up in an HA or standalone environment, you should check that the primary and secondary servers ha.type and ha.mode are set properly to active and passive, respectively.

    Updating the Configuration in an HA Pair

    The Enterprise Console will copy any file-level configuration customizations made on the primary controller to the secondary controller, such as changes in domain.xml and db.cnf.

    Over time, if you need to make modifications to the Controller configuration, always do those changes in the Enterprise Console on the Controller Settings page under Configurations. These changes will be preserved during upgrades. Any changes made outside the Enterprise Console will not be preserved after upgrade. 

    Troubleshooting HA

    Controller Diagnostic Data

    The Enterprise Console writes log messages pertaining to HA to the platform-admin-server.log on the Enterprise Console host.

    To diagnose the Controller, run the following command:

    No Format
    bin/platform-admin.sh submit-job --platform-name <name_of_the_platform> --job diagnosis --service controller 

    Refer to the Controller diagnostic data in the platform-admin-server.log.

    Sample Controller diagnostic data

    Linux

    Expand
    No Format
    Controller diagnostic data:
    123.45.0.1:
    controller_database: running
    controller_appserver: running
    reports_service: running
    operating_system: Linux
    controller_version: 004-004-001-000
    controller_performance_profile: small
    controller_ha_type: primary
    controller_appserver_mode: active
    controller_metric_data_per_min: N/A
    slave_io_state: Waiting for master to send event
    seconds_behind_master: 0
    master_server_id: 567.
    master_host: controller-secondary
    master_ssl_allowed: No
    
    123.45.0.2:
    controller_database: running
    controller_appserver: not running
    reports_service: running
    operating_system: Linux
    controller_version: 004-004-001-000
    controller_performance_profile: small
    controller_ha_type: secondary
    controller_appserver_mode: passive

    Invalid HA Controller Roles

    If your HA Controller roles in the Controller databases are incorrect, the Enterprise Console will prevent discover and upgrade jobs. An invalid HA Controller state is when both of your Controller role types are identical, such as in a primary/primary or secondary/secondary case.

    To fix this issue:

    1. Identify which server is the primary.
      1. Log in to one of the Controller databases by running the following command in the Controller installation directory:

        No Format
        bin/controller.sh login-db
      2. Run the following command:

        No Format
        select * from global_configuration_local where name=‘ha.controller.type’;
    2. Ensure that ha.controller.type is set correctly in the database.
      1. Log in to the Controller database you would like to change by running the following command in the Controller installation directory:

        No Format
        bin/controller.sh login-db
      2. Run the following commands to set the database to the primary or secondary:

        Tabs Container
        Width800px
        directionhorizontal
        Tabs Page
        tabNamePrimary
        titlePrimary
        Code Block
        languagebash
        use controller;
        update global_configuration_local set value=‘primary’ where name=‘ha.controller.type’;
        update global_configuration_local set value=‘active’ where name=‘appserver.mode’;
        Tabs Page
        tabNameSecondary
        titleSecondary
        Code Block
        languagebash
        use controller:
        update global_configuration_local set value=‘secondary’ where name=‘ha.controller.type’;
        update global_configuration_local set value=‘passive’ where name=‘appserver.mode’;
    3. Restart the database for the change to take effect on the Appserver:

      No Format
      bin/platform-admin.sh stop-controller-appserver --with-db
      bin/platform-admin.sh start-controller-appserver --with-db

      If the secondary Appserver is already in a shutdown state, then there is no need to restart the database.

    4. Verify the replication is healthy:

      No Format
      show slave status\G

      Slave_IO_Running and Slave_SQL_Running should show Yes.

    You may now retry the discover and upgrade job.

    Failover Prevention

    If failover is prevented on your Controller HA configuration, it may be due to one of two scenarios:

    • The secondary database is down. Failover cannot occur when the secondary database is not running.
      To fix this issue:
      1. Restart the secondary database by running the following command on the secondary host:

        No Format
        bin/controller.sh start-db
      If this does not enable failover, then it may be due to the second scenario.
    • Database replication is not healthy. Failover is not allowed when the database replication is not healthy.
      There are various reasons why this may be the case. Please work closely with your AppDynamics account representative to correct the issue.

    ...