On this page:

Your Rating:
Results:
PatheticBadOKGoodOutstanding!
16 rates

A machine that hosts a large Controller deployment, as defined in Controller Performance Profiles, requires some additional tuning adjustments, as described here. 

The recommendations on this page are provided as general guidelines only. For specific advice for your deployment, you should work closely with your AppDynamics account representative. It is especially important that a machine that hosts a Controller in a high workload environment meets the Controller System Requirements for its profile.

Linux Settings

Use the following settings for the operating system on which the Controller runs:

  • Use the Deadline scheduler.
  • Set the open file limit (ulimit) to 819200 or greater.
  • Set the per-process open file limit for soft and hard limits to 819200 or greater.
  • Allow Web server to retry longer during stalls by setting higher TCP timeouts.
  • If using the Splunk extension, increase the maximum number of user processes to 2048.

File System and RAID Recommendations

  • AppDynamics software is very dependent on low disk latency for proper operation. We have strict latency limits which have to met. We require that the 90th percentile response time measured over any 5s period remain under 3ms.
  • We have found that the best guaranteed performance is with direct attached (local) disks. 
  • Recommended file system is ext4 with journaling.
  • For all Controller systems, use the following file system mount options for your database storage mountpoint: "noatime,nodiratime,nobarrier,data=writeback"
  • The following points highlight considerations surrounding the use of RAID for the Controller host. For further discussion of using RAID storage with the Controller, see RAID Overview and Configuration on the AppDynamics Community. 
    • There are many options for RAID storage. The specific RAID configuration you choose involves a careful balance between availability and redundancy of data versus performance, and really depends on your application. See RAID Overview and Configuration for more information.  
    • A mirrored SDD/NVMe based device is recommended for all configuration of 1M metrics/min, and larger, as the Controller log location. The stripe should be 128K, to match the write size, for the log. When running the installer, specify the log directory location as the NVMe card.
    • Use a RAID Controller with a Battery Backup Unit (BBU) to allow the database to use O_DIRECT sync mode as the flush method for faster disk writes. Never set O_DIRECT without a BBU. You can specify this setting using the innodb_flush_method variable in <controller_home>/db/db.cnf. When not specified in the file (as is the case by default), MySQL uses the default flush method, fdatasync. See the MySQL documentation for more information.
    • If using HDDs, disable individual hard drive from using onboard cache. All caching is to be done by RAID Controller.
  • LVM partitions can be useful especially for doing quick hot backups with LVM snapshots.

Glassfish Configuration

You can edit most Glassfish settings on the Enterprise Console AppServer Configurations page or in the following configuration file: <controller_home>/appserver/glassfish/domains/domain1/config/domain.xml 

After modifying the file, you will need to restart the Controller to have your changes take effect.

The following section provides more settings relevant to the Controller. For complete information about configuring Glassfish, see the Glassfish documentation.

To update the Glassfish settings
  1. Set the thread count (XX) to a value of 12 X # of CPU cores. 
    In the tag <thread-pools> change the line containing "http-thread-pool" to:

    <thread-pool name="http-thread-pool" max-thread-pool-size="XX" min-thread-pool-size="16" max-queue-size="32768"></thread-pool>
    

    If your Controller instance was upgraded from an earlier version, delete the following element if present, as it applies to Glassfish v2 only:

    <request-processing header-buffer-length-in-bytes="8192" initial-thread-count="16" request-timeout-in-seconds="300" thread-count="XX" thread-increment="1"/>
    
  2. Replace the TCP transport element under server-config with the following. Replace the XX value in the acceptor-threads attribute with the number of cores in your system.

    <transport buffer-size-bytes="32768" max-connections-count="32768" name="tcp" keep-alive="true" acceptor-threads="XX"></transport>
    

    This sets the depth of the connection pool queue to 32K to allow for connections to queue up instead of being dropped during peak load bursts. 
    Note that there are two transport elements with the name value of tcp in the file. Be sure to modify the first one, which appears below the server-config element.

  3. If your Controller instance was upgraded from an earlier version, delete the following element if present, as it applies to Glassfish v2 only:

    <connection-pool max-pending-count="32768" queue-size-in-bytes="32768" receive-buffer-size-in-bytes="32768" send-buffer-size-in-bytes="32768"/>
  4. Increase the buffer size. In the <protocols> section under <protocol name="http-listener-1">, change the next line to:

    <http request-timeout-seconds="300" max-connections="-1" request-body-buffer-size-bytes="32768" header-buffer-length-bytes="32768" timeout-seconds="300" default-virtual-server="server" send-buffer-size-bytes="32768" compressable-mime-type="text/html, text/javascript, text/css" compression="on">
  5. Set the -Xmn value to about a third of the -Xmx setting. (Note that the -Xmx value is set by the installer based on the available memory.)

    <jvm-options>-Xmn10g</jvm-options>
  6. Replace the garbage collection settings with these specially tuned settings:

    <jvm-options>-XX:+UseConcMarkSweepGC</jvm-options>
    <jvm-options>-XX:+UseParNewGC</jvm-options>
    <jvm-options>-XX:+ScavengeBeforeFullGC</jvm-options>
    <jvm-options>-XX:TargetSurvivorRatio=80</jvm-options>
    <jvm-options>-XX:SurvivorRatio=6</jvm-options>
    <jvm-options>-XX:+UseBiasedLocking</jvm-options>
    <jvm-options>-XX:MaxTenuringThreshold=15</jvm-options>
    <jvm-options>-XX:ParallelGCThreads=16</jvm-options>
    <jvm-options>-XX:+OptimizeStringConcat</jvm-options>
    <jvm-options>-XX:+UseCompressedOops</jvm-options>
    <jvm-options>-XX:+UseCMSInitiatingOccupancyOnly</jvm-options>
    <jvm-options>-XX:CMSInitiatingOccupancyFraction=70</jvm-options>
    
  7. Optionally, add garbage collection (GC)-related output options. For example: 

    <jvm-options>-verbose:gc</jvm-options>
    <jvm-options>-XX:+PrintGCDateStamps</jvm-options>
    <jvm-options>-XX:+PrintGCDetails</jvm-options>
    <jvm-options>-XX:+PrintClassHistogramBeforeFullGC</jvm-options>
    <jvm-options>-XX:PrintFLSStatistics=1</jvm-options>
    <jvm-options>-XX:+PrintPromotionFailure</jvm-options>
    <jvm-options>-Xloggc:/var/log/controller/gc.log</jvm-options>
    <jvm-options>-XX:+UseGCLogFileRotation</jvm-options>
    <jvm-options>-XX:NumberOfGCLogFiles=2</jvm-options>
    <jvm-options>-XX:GCLogFileSize=512m</jvm-options>
    <jvm-options>-XX:+PrintTenuringDistribution</jvm-options>

MySQL Configuration

The Controller uses MySQL as its the database server. You should tune MySQL for scalability and better performance.

These database settings are automated by the Enterprise Console for large and extra-large profiles. You, therefore, do not need to modify them manually.

You can edit most MySQL settings on the Enterprise Console Database Configurations page or in the following configuration file: <Controller_Installation_Directory>/db/my.cnf

Manually updated settings are not preserved upon upgrade.

It is recommended that you update the MySQL configuration settings using the Enterprise Console Controller Settings page.

To configure MySQL
  1. Shut down the Controller.
  2. Configure the settings in the database configuration file, <Controller_Installation_Directory>/db/my.cnf, with these values. Since some of these name/value pairs are already in the my.cnf file, search in the file for each of the variables by name. If a variable already exists, simply replace the assigned value. Otherwise, add the name/value pair to the file.

    open-files-limit=40960
    thread_cache_size=120
    table_definition_cache=12000
    table_open_cache=4000
    innodb_open_files=8192
    lock_wait_timeout=180
    max_connections=2000
    innodb_io_capacity=9600
    innodb_read_io_threads=20
    innodb_write_io_threads=20
    innodb_buffer_pool_size=<1/3 rd of the total RAM)
    innodb_log_file_size=10G
    innodb_log_buffer_size=2048M
    sync_binlog=0
    gtid-mode=OFF
    enforce-gtid-consistency=OFF
    innodb_max_dirty_pages_pct=20
    tmp_table_size=256M 

    Note that the installer sets innodb_buffer_pool_size to a third of the available memory on the machine.  

    The combined amount of RAM specified in this step and in your GlassFish configuration should not exceed 80% of the total RAM for the system.

  3. Restart the Controller.
  4. Your database should now be running with the new settings, but we suggest that you verify your database settings. To do so:
    1. Log in to the database:

      <Controller_installation_directory>/bin/controller.sh login-db
      
    2. Enter the "SHOW VARIABLES;" command and verify that the variables are assigned the values you expect as reported in the command output.

If your large Controller deployment consists of HA Controllers, the Enterprise Console will apply changes on both databases and restart services following the proper sequence.

Iterative Tuning

After you perform the configuration for the Glassfish server and MySQL, use the following recommendations to perform iterative tuning for your large-scale deployment:

  • Run the Controller for at least 3 days.
  • Examine Controller heap utilization and garbage collection metrics in the Controller system account.
    • If the Controller is performing less than three major garbage collections per day, decrease the size of the java heap and increase the innodb_buffer_pool_size by the same amount.
    • If the Controller is performing more than six major garbage collections per day, decrease the size of the innodb_buffer_pool_size and increase the java heap by the same amount.
    • If the garbage collection rate is too high, you may need to adjust the heap size and change the garbage collector settings. Because the heap size and garbage collector settings are sensitive and adjusted incorrectly could exacerbate performance issues, you are recommended to contact your AppDynamics account representative for guidance.
  • If the Java heap ends up being more than twice the size of the innodb_buffer_pool, consider adding RAM to the Controller host to handle the workload.

Controller Configuration

You need to adjust some Controller settings for configurations that handle large amounts of traffic. This includes increasing the events, snapshots, and buffer sizes, increasing the read and write thread counts, decreasing the node retention and node permanent deletion periods, etc.

To change the Controller settings
  1. Log in to the Controller Administration console at <host>:<port>/controller/admin.jsp using the root password. See Access the Administration Console.
  2. In the Administration Console, select Controller Settings.
  3. Modify each of the following settings, clicking Save to save each update.

     Click here to expand...
    admin.jsp 
    application.custom.metric.registration.limit=40000000
    application.metric.registration.limit=40000000
    async.thread.tracking.registration.limit=1000
    backend.registration.limit=500000
    collections.ADD.registration.limit=400000
    controller.metric.registration.limit=40000000
    error.registration.limit=4000
    events.upload.limit.per.min=500
    memory.ADD.registration.limit=400000
    metric.registration.limit=40000000
    metrics.buffer.size=300 MB
       For 1000 agents that ingest 100,000 metrics per minute, use 300 MB for the buffer size. Use this value as a guideline for tuning your Controller.
    msds.upload.limit.per.min=500
    node.permanent.deletion.period=168
    node.retention.period=6
    read.thread.count=3 
       Number of threads to use simultaneously to do READS from the database. Set this to 20% of the number of CPU cores but not greater than 4.
    sep.ADD.registration.limit=400000
    sim.docker.machine.container.limit=15
    sim.machines.deleteStaleMachines.maxLimit=100
    sim.processes.query.maxResultLimit=5000
    stacktrace.ADD.registration.limit=4000
    tracked.object.ADD.registration.limit=4000
    events.buffer.size=250
    snapshot.buffer.size=500
    sep.ADD.registration.limit=10000
    write.thread.count=4 
       Number of threads to use simultaneously to load data into the database. Set this to the same value as read.thread.count but not greater than 4.

Tuning for Environments With More Than 500 Nodes

Additional recommendations apply for deployments with more than 500 nodes. For these environments, AppDynamics recommends that you split traffic from the UI and App Agents to different ports on the Controller, as described below. You can also split traffic into multiple thread pools, as described below, which is a more commonly performed practice.

In addition, AppDynamics recommends terminating SSL at a load balancer or HTTP proxy in front of the Controller. This alleviates the workload of SSL processing from the Controller. For information on configuring a reverse proxy, see Use a Reverse Proxy.

Split Traffic into Multiple Thread Pools

You can configure multiple domain protocols, network listeners, transports, and thread pools in Glassfish. This splits traffic in a much more granular way, similar to how SaaS Controllers are configured.

Follow the steps below to split traffic:

Terminate the SSL processing at a Proxy

  1. Make a backup copy of services-config.xml before editing it. This file is located at <controller-home>/appserver/glassfish/domains/domain1/applications/controller/controller-web_war/WEB-INF/flex/.
  2. Edit the services-config.xml file of the Glassfish server:
    a. Find the channel-definition element with an id value of my-secure-amf. 
    b. Replace the default value of the class attribute of the endpoint URL element, flex.messaging.endpoints.SecureAMFEndpoint, with a new value of flex.messaging.endpoints.AMFEndpoint. 
    c. The resulting element should look like this:

    <channel-definition id="my-secure-amf" class="mx.messaging.channels.SecureAMFChannel">
        <endpoint url="https://<hostname>:<port>/controller/messagebroker/amfsecure" class="flex.messaging.endpoints.AMFEndpoint"/>
            <properties>
            <add-no-cache-headers>false</add-no-cache-headers>
            <connect-timeout-seconds>10</connect-timeout-seconds>
        </properties>
    </channel-definition>

Configure Thread Pools

The domain protocols, network listeners, transports, and thread pools should also be configured using the Enterprise Console UI. You can edit them on the AppServer Configurations page by choosing the platform, and navigating to ConfigurationsController Settings, and Appserver Configurations

Sample content for each of these are included in the following files: 

The Enterprise Console restarts the Controller after you submit your configurations.

Create and Configure a Load Balancer

  1. If using HTTPS, An SSL certificate should be available. For testing, a self-signed certificate may be generated using Open SSL as described here. You can then import the certificate using AWS Certificate Manager (ACM).
  2. Set the external load balancer URL. The Controller configuration should be updated using the Enterprise Console GUI to specify the external load balancer URL.
    a. Open a browser and navigate to the GUI (9191 is the default port): 

    http://<hostname>:<port>


    b. Navigate to AppServer Configurations by choosing the platform, ConfigurationsController Settings, and Appserver Configurations.
    c. Enter the external load balancer URL in the appropriate field, and click Save.


  • No labels