On this page:

Related pages:

 

Your Rating:
Results:
PatheticBadOKGoodOutstanding!
20 rates

Many times the root cause of application issues is most obvious by looking at application, network, server, and machine metrics that measure infrastructure utilization. For example, the following infrastructure issues can slow down your application:

  • Too much time spent in garbage collection of temporary objects (application metric)
  • Packet loss between two nodes that results in retransmissions and slow calls (network metric)
  • Inefficient processes that result in high CPU utilization (server metric)
  • Excessively high rates of reads/writes on a specific disk or partition (hardware metric)

Infrastructure Visibility enables you to isolate, identify, and troubleshoot these types of issues. Infrastructure Visibility is based on a machine agent that runs with an app server agent on the same machine. These two agents provide multi-layer monitoring as follows:

  1. The app server agent collects metrics about applications and identifies applications, tiers, and nodes with slow transactions, stalled transactions, and other application-performance issues.
  2. The Network Agent monitors the network packets sent and received on each node and identifies lost/retransmitted packets, TCP bottlenecks, high round trip times, and other network issues.
  3. The machine agent collects metrics at two levels:
    1. Server Visibility metrics about local processes, services, and resource utilization.
    2. Basic machine metrics about disks, memory, CPU, and network interfaces. 

This multi-layer monitoring enables you to find possible correlations between application issues and service, process, hardware, network, or other issues on the machine.

Network Visibility

Network Visibility monitors traffic flows, network packets, TCP connections, and TCP ports. Network Agents leverage the APM intelligence of App Server Agents to identify the TCP connections used by each application. Network Visibility includes the following functionality:

  • Detailed metrics about dropped/retransmitted packets, TCP window sizes (Limited / Zero), connection setup/teardown issues, high round trip times, and other performance-impacting issues  
  • Network Dashboard that highlights network KPIs (Key Performance Indicators) for tiers, nodes, and network links
  • Right-click dashboards for tiers, nodes, and network links that enable quick drill-downs from transaction outliers to network root causes
  • Automatic mapping of TCP connections with application flows
  • Automatic detection of intermediate load balancers that split TCP connections
  • Diagnostic mode for collecting advanced diagnostic information for individual connections

Server Visibility

Server Visibility monitors local processes, services, and resource utilization. You can use these metrics to identify time windows when problematic application performance correlates with problematic server performance on one or more nodes.

Server Visibility is an add-on module to the Standalone Machine Agent. With Server Visibility enabled, the machine agent provides the following additional functionality:

  • Extended hardware metrics such as machine availability, disk/CPU/virtual-memory utilization, and process page faults 
  • Monitor application nodes that run inside Docker containers and identify container issues that impact application performance
  • The Tier Metric Correlator, which enables you to identify load and performance anomalies across all nodes in a tier
  • Import and define server tags that make it easy to query, filter, and compare related servers using custom metadata
  • Monitor internal or external HTTP and HTTPS services
  • Support for grouping servers so you can apply health rules to specific server groups
  • Support for defining alerts that trigger when certain conditions are met or exceeded based on monitored server hardware metrics

Basic Machine Metrics

The Standalone Machine Agent collects basic hardware metrics from the server's OS. This agent provides the following functionality:

  • Basic hardware metrics from the server's OS such as CPU and memory utilization, throughput on network interfaces, and disk and network I/O
  • Support for creating extensions to generate custom metrics
  • Support for running remediation scripts to automate your runbook procedures. You can optionally configure the remediation action to require human approval before the script is started.
  • JVM Crash Guard for monitoring JVM crashes and optionally running remediation scripts


Java and .NET Infrastructure Monitoring

Infrastructure Visibility uses different agents to monitor Java and .NET environments:

  • The Java Agent collects metrics for business applications and JVMs. The Standalone Machine Agent collects Server Visibility and hardware/OS metrics.
  • The .NET Agent collects metrics for business applications and instrumented CLRs. The .NET Agent includes a .NET Machine Agent that collects IIS and hardware/OS metrics (see Monitor Windows Hardware Resources). The Standalone Machine Agent collects Server Visibility metrics. 

Infrastructure Visibility Strategies

You can use the following strategies to find infrastructure issues that affect application performance:

  • Transaction snapshots for slow or stalled transactions – You can use snapshots to correlate infrastructure metrics for the specific node so that you can identify the root cause of slow or stalled transactions.
  • Metric correlation – 
    • One example workflow is to open the Node Dashboard for a mission-critical server with a machine agent installed and then to cross-compare data in the following tabs:
      • JVM (application performance)
      • JMX (server performance)
      • Server (hardware resource consumption)
    • The Network Dashboard includes right-click dashboards for tiers, nodes, and network links. These dashboards make it easy to find correlations between application issues and network root causes. 
    • The Tier Metric Correlator enables you to identify load and performance anomalies in a tier composed of a cluster of nodes running on containers or servers. 
  • Health rules – You can configure health rules on metrics such as garbage collection time, connection pool contention, or CPU usage to catch issues early in the cycle before there is an impact on your business transactions.
  • Infrastructure rules, policies, and alerts – You can
    • Create health rules on metrics such as garbage collection time, connection pool contention, or CPU usage to catch issues early in the cycle before there is an impact on your business transactions.
    • Define policies that trigger actions (such send an email, start diagnostics, or perform a thread dump) when Infrastructure metrics report a critical level.
    • You can configure alerts for JVM and CLR crashes respectively using JVM Crash Guard and the .NET Machine Agent
    • Configure the agent to run scripts in response to critical events (for example, restart an application or JVM in response to a crash).

With the right monitoring strategy in place, you can be alerted to problems and fix them before user transactions are affected.

  • No labels