Configure Log Analytics Using Job Files

Related pages:

This page describes how to configure log sources using job files. If you are configuring new log sources, we recommend you use the Centralized Log Management UI to define source rules.

To configure log analytics using job files:

Describe the Log Source in a Job File
Map the Log File Data to Analytics Fields
Verify Analytics Agent Properties

Describe the Log Source in a Job File

Each log source is represented by a job file. A job file is a configuration file that specifies the following:

Location of the source log file
Pattern for capturing records from the log file
Pattern for structuring the data from the captured log records
Other options for capturing records from the log source

To define a source, you create a job file (or modify one of the samples) in the Analytics Agent configuration directory. The Analytics Agent includes sample job files for Glassfish, OSX log, and others. The Analytics Agent can also collect and parse GZIP files - (log files ending in .gz).

The job files are located in the following directory:

<Analytics_Agent_Home>/conf/job/

The agent reads the job files in the directory dynamically, so you can add job files in the directory without restarting the agent.

To configure a job file, use these configurable settings in the file:

enabled: Determines whether this log source is active. To capture analytics data from this log source, set the value to true.
source: Specifies the source of the logs.
- type: Specifies the type of log source. There are two types, file, and syslog. Additional parameters depend on the value of type.
  - file: The location and name of the log file to serve as a log source. The location must be on the same machine as the analytics-agent. File has the following parameters:
    - path: Path to the directory where the log files reside. On Windows, the path should be provided as if on Unix environments such as:
    - Example: demo/logs
    - Example: C:/app/logs
    - nameGlob: A string to use to match on the log file name. You can use wild cards and you can specify whether to match files one level deep or all log files in the path directory structure. If the wild card starts the value of nameGlob, you must enclose the value in quotes. The matching patterns that are supported can be found here: http://java.boot.by/ocpjp7-upgrade/ch06s05.html under "glob".
    - Example for multi-level matching:
      path: /var/log
      nameGlob: '**/*.log'
      This matches both /var/log/apache2/logs/error.log and /var/log/cassandra/system.log
    - Example of one level matching:
      path: /var/log
      nameglob: '*/*.log'
      This searches for .log files one level deep in the /var/log directory (matches on /var/log/cassandra/system.log but not on /var/log/apache2/logs/error.log).
    - startAtEnd: If set to true allows tailing the file from the end.
  - syslog: See Collect Log Analytics Data from Syslog Messages.
- multiline: For log files that include log records that span multiple lines (spanning multiple line breaks) configure the multiline property and indicate how the individual records in the log file should be identified. A typical example of a multiline log record is one that includes a Java exception. You can use one of two options with the multiline property to identify the lines for a multiline log record:
  - startsWith: A simple prefix that matches the start of the multiline log record.
    Example 1: To capture the following multiline log as one record:
    [#|2015-09-24T06:33:31.574-0700|INFO|glassfish3.1.2|com.appdynamics.METRICS.WRITE|_ThreadID=206;_ThreadName=Thread-2;|NODE PURGER Completed in 14 ms|#]
    You could use this:
    multiline: startsWith: "[#|"
    Example 2: To capture the following multiline log as one record:
    May 5, 2016 6:07:02 AM com.appdynamics.test.payments.plugin.testsource.testPaymentClient deposit(pluginContext, financialTransaction, retry) INFO: Source deposit requested amount:245.43000 May 5, 2016 6:07:02 AM com.appdynamics.test.payments.plugin.testsource.testPaymentClient deposit(pluginContext, financialTransaction, retry) INFO: Source deposit currency:USD May 5, 2016 6:07:02 AM com.appdynamics.test.payments.plugin.testsource.testPaymentClient deposit(pluginContext, financialTransaction, retry) INFO: Source deposit payment method:Master Card May 5, 2016 6:07:03 AM com.appdynamics.test.payments.plugin.testsource.testPaymentClient deposit(pluginContext, financialTransaction, retry) INFO: Source deposit decision:ACCEPT May 5, 2016 6:07:03 AM com.appdynamics.test.payments.plugin.testsource.testPaymentClient deposit(pluginContext, financialTransaction, retry) INFO: Source deposit reason code:200 May 5, 2016 6:07:03 AM com.appdynamics.test.payments.plugin.testsource.testPaymentClient deposit(pluginContext, financialTransaction, retry) INFO: Source deposit orderId:7654 May 5, 2016 6:07:03 AM com.appdynamics.test.payments.plugin.testsource.testPaymentClient deposit(pluginContext, financialTransaction, retry) INFO: Source deposit piId:1234B May 5, 2016 6:07:03 AM com.appdynamics.test.payments.plugin.testsource.testPaymentClient deposit(pluginContext, financialTransaction, retry) INFO: Source deposit referenceNumber:4620346703956702001 May 5, 2016 6:07:03 AM com.appdynamics.test.payments.plugin.testsource.testPaymentClient deposit(pluginContext, financialTransaction, retry) INFO: Source deposit extData requestToken:alf/7wSR9PBh3zSMQ+++IYTRlk3Ys2DOxOmM5jWLRTyFJaSggCTyFJaSgjSB1/2v0wyTWRV0ekb0X
    You could use this:
    multiline: startsWith: "INFO: Source deposit requested amount"
  - regex: A regular expression that matches the multiline log record.
    Example: To capture this multiline log as one record:
    2016-06-01 16:28:21.035 WebContainer : 8 TRANSTART> ======================================= 2016-06-01 16:28:21.036 WebContainer : 8 MERCHCFG > merchantID=appD, sendToProduction=true, targetAPIVersion=1.00, keyFilename=(null), serverURL=(null), namespaceURI=(null), enableLog=true, logDirectory=logs, logFilename=(null), logMaximumSize=10, useHttpClient=false, timeout=130 2016-06-01 16:28:21.037 WebContainer : 8 REQUEST > merchantID=appD davService_run=true clientLibraryVersion=2.0.1 clientEnvironment=OS/400/V7R1M0/LINUX application_id=ff22 exportService_addressWeight=medium merchantReferenceCode=appD clientLibrary=Java Basic billTo_customer=xxxxxxxxx billTo_finance=true billTo_postalCode=94105 billTo_country=US
    You can use the following regex configuration to identify the start of each "record":
    multiline: regex: "^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3} WebContainer : \d+ TRANSTART\u003E =======================================$"
    The regex describes that the line should match 4 digits(2016) followed by followed by 2 digits (01), followed by a space, followed by 2 digit(16) ':' 2 digit (28) ':' 2 digits(21) '.' 3 digits(035) space followed by term 'WebContainer' space digit followed by term 'TRANSTART' folowed by '>'.
    Whenever the Log tailer sees the matching pattern at the start of the line, it starts a new match and passes the previously collected data as one log record. Each matching pattern identifies when to start accumulating the log lines in a buffer until the matching pattern is found again at the beginning of another record.
  If the particular format of a multiline log file does not permit reliable continuation line matching by regular expression, you may choose to use a single line format. For most types of logs, this would result in the capture of the majority of log records.
fields: The fields are used to specify the context of the log data in the Controller UI, by application name, tier name, and so on. Specify the fields as free form, key-value pairs.
grok: The grok parameter specifies the patterns by which the data in the unstructured log record is mapped to structured analytics fields. It associates a named grok expression (as defined in a .grok file that is bundled inside lib/analytics-shared-pipeline-core.jar) to a field in the data as structured by the agent. For example:
```
grok:
  patterns: 
       - "\\[%{LOGLEVEL:logLevel}%{SPACE}\\]  \\[%{DATA:threadName}\\]  \\[%{JAVACLASS:class}\\]  %{GREEDYDATA}"
       - "pattern 2"
       ... 
```
In this case, the grok-pattern name LOGLEVEL is matched to an analytics data field named logLevel. The regular expression that is specified by the name LOGLEVEL is defined in the file grok-patterns.grok in the grok directory. See Specifying Grok Expressions.
Previous versions of Log Analytics used a single "pattern" rather than a pattern list. This mode is still supported for backward compatibility.
The log analytics grok processor does not allow underscores in the field names. For example, a field name "log_fields" will not work, Instead, use something like "logfields".
keyValue: The keyValue parameter specifies how to parse the logs to identify key-value pairs with a user-defined delimiter. This enables you to configure the parsing for a message of the type "Key1 = Value1 Key2 = Value 2". See Specifying Key Value Pairs.
transform: This parameter specifies how to change the type or alias name of any field extracted from the logs by grok or keyValue parameters.
eventTimestamp: This setting defines the pattern for the timestamp associated with captured data.

Map the Log File Data to Analytics Fields

To specify how data in the unstructured log records should be mapped to structured analytics fields for log analytics, you provide the configuration in the job file. You can map unstructured log data in the following ways:

grok patterns
key value pairs
transforms

Specify Grok Expressions

Grok is a way to define and use complex, nested regular expressions in an easy to read and use format. Regular expressions defining discrete elements in a log file are mapped to grok-pattern names, which can also be used to create more complex patterns.

Grok-pattern names for many of the common types of data found in logs are provided for you with the analytics agent. A list of basic grok-pattern names and their underlying structures are bundled inside lib/analytics-shared-pipeline-core.jar. You can list the grok files with a command such as the following:

unzip -l ./lib/analytics-shared-pipeline-core.jar|grep "\.grok"

CODE

To can view the definition of a grok file with a command such as the following:

unzip -p ./lib/analytics-shared-pipeline-core.jar grok/grok-patterns.grok

CODE

The grok directory also contains samples of more complex definitions customized for various common log types - java.grok, mongodb.grok, and so on. Additional grok patterns can be found here: https://grokdebug.herokuapp.com/patterns#.

Once the grok-pattern names are created, they are then associated in the jobs file with field identifiers that become the analytics keys.

The basic building block is %{grok-pattern name:identifier}, where grok-pattern name is the grok pattern that knows about the type of data in the log you want to fetch (based on a regex definition) and identifier is your identifier for the kind of data, which becomes the analytics key. So %{IP:client} would select an IP address in the log record and map it to the key client.

Custom Grok Patterns

Complex grok patterns can be created using nested basic patterns. For example, from the mongodb.grok file:

MONGO_LOG %{SYSLOGTIMESTAMP:timestamp} \[%{WORD:component}\] %{GREEDYDATA}

It is also possible to create entirely new patterns using regular expressions. For example, the following line from java.grok defines a grok pattern named JAVACLASS.

JAVACLASS (?:[a-zA-Z$_][a-zA-Z$_0-9]*\.)*[a-zA-Z$_][a-zA-Z$_0-9]*

Because JAVACLASS is defined in a .grok file in the grok directory can be used as if it were a basic grok pattern. In a jobs file, you can use the JAVACLASS pattern match as follows:

grok:
  pattern: ".... \[%{JAVACLASS:class}\\]

In this case, the field name as it appears in the Analytics UI would be "class". For a full example, see the following files:

Job file: <Analytics_Agent_Home>/conf/job/sample-analytics-log.job
Grok file: java.grok bundled inside lib/analytics-shared-pipeline-core.jar

Special Considerations for Backslashes

The job file is in YAML format, which treats the backslash as an escape character. Therefore, to include a literal backslash in the String pattern you need to escape the backslash with a second backslash. You can avoid the need to escape backslashes in the .job file grok pattern, by enclosing the grok pattern in single quotes instead of double quotes such as the following:

grok:
  patterns:
    - '\[%{DATESTAMP:TIME}%{SPACE}CET\]%{SPACE}%{NOTSPACE:appId}%{SPACE}%{NOTSPACE:appName}%{SPACE}%{NOTSPACE:Severity}%{SPACE}%{NOTSPACE:messageId}:%{SPACE}%{GREEDYDATA}'

Numeric Fields

In Release 4.1.3, the grok definition syntax was enhanced to support three basic data types. When defining a pattern in the .grok file you can specify the data type as number, boolean, or string. If a Grok alias uses that grok definition in a .job file then the extracted field is stored as a number or boolean. Strings are the default. If the number or boolean conversion fails, then a log message appears in the agent's log file. No validations are performed as it is not possible to reverse engineer a regex reliably. These are pure runtime extractions and conversions.

Upgrade pre-4.1.3 Job Files

Click to expand...

For 4.1.2 (or older) .job files in use that have fields that are unspecified or specified as NUMBER and now switch to the "type aware" files, the data inside Events Service will break. This is due to the type mapping. To avoid this, you need to modify the grok alias in your job files.
Examples:

Was:
grok:
  patterns:
    - "%{DATE:happenedAt},%{NUMBER:quantity}

Update job to:
grok:
  patterns:
    - "%{DATE:happenedAt},%{NUMBER:quantity_new}

Was:
grok:
  patterns:
    - "%{DATE:happenedAt},%{DATA:howMany}

Update job to:
grok:
  patterns:
    - "%{DATE:happenedAt},%{POSINT:howManyInt}

To Upgrade (Migrate) < 4.1.3 Job Files:

Stop analytics-agent.

Change .job files that use the enhanced grok patterns:

BOOL:boolean
INT:number
BASE10NUM:number
NUMBER:number
POSINT:number
NONNEGINT:number

Change the grok alias so as not to conflict with the older aliases:

grok:
  patterns:
(Old)  - "%{DATE:quoteDate},%{NUMBER:open},%{NUMBER:high},%{NUMBER:low},%{NUMBER:close},%{NUMBER:volume},%{NUMBER:adjClose}"

(New aliases)  - "%{DATE:quoteDate},%{NUMBER:openNum},%{NUMBER:highNum},%{NUMBER:lowNum},%{NUMBER:closeNum},%{NUMBER:volumeNum},%{NUMBER:adjCloseNum}"

Start analytics-agent.

Specifying Key-Value Pairs

This section of the mapping configuration captures key-value pairs from fields specified by the source parameter. The values listed under source should refer to fields that were defined and captured by a grok pattern. For example, if you have a grok parameter that defines the following pattern "%{DATA:keyValuePairs}" then you can list the field "keyValuePairs" under the source parameter to capture any key-value pairs listed in the "keyValuePairs" string. If the source parameter is not specified then the agent attempts to extract key-value pairs from the entire log message. The result can be different than expected if the message contains more information than just key-value pairs.

The Key Value mapping contains these fields:

source: A list of strings on which the keyValue filter should be applied. It is an optional field. If it is not provided the key-value pairs are parsed from the original log "message" string.
split: The delimiter defined by the user to separate out the key from the value. In this example, key=value, the split delimiter between the key and the value is the equal sign =
separator: The delimiter defined by the user to separate out two key-value pairs. In this example, key1=value1;key2=value2, the separator is the semi-colon ;
include: A list of key names that the user wants to capture from the "source". If nothing is provided in "include" we capture all the key-value pairs.
trim: A list of characters the user wants to remove from the starting and/or the end of the key/value before storing them.

The sample-glassfish-log.job file includes key-value pairs configuration. This file is found here: <analytics_agent_home>/conf/job/.

Key-Value Pairs Example

For a log file with the following entries:

And the following grok pattern:

grok:
  patterns:
    - "\\[\\#\\|%{DATA}\\|%{LOGLEVEL:logLevel}\\|%{DATA:serverVersion}\\|%{JAVACLASS:class}\\|%{DATA:keyValuePairs}\\|%{GREEDYDATA}"

The key-value parameter to extract the ThreadID and the ThreadName should look similar to the following:

keyValue:
  source:
	-"keyValuePairs"
  split: "="
  separator: ";"
  include:
	- "ThreadID"
	- "ThreadName"	
  trim:
	- "_"

Specify Transform Parameters

This section of the mapping configuration enables you to change the type or alias name of any field previously extracted from the logs by your grok or key value configuration. The transform is applied after all fields have been captured from the log message. You can specify a list of field names, where you want to cast the value to a specific type or rename the field with an "alias".

The Transform mapping contains these fields:

field: Specifies the name of the field to transform and can not be empty or null. If field is defined, either type or alias must be specified. If neither is specified, an error is written to the analytics-agent.log file.
alias: The new name for the field.
type: The value type for the field. Valid values are:
- NUMBER
- BOOLEAN
- STRING - default is STRING

Verify Analytics Agent Properties

In addition to configuring the log source in the job file, you should verify the values in the analytics-agent.properties file found in the conf directory. Confirm these property values:

http.event.endpoint should be the location of the Events Service.
- For SaaS controllers the URL is one of the following:
  - https://analytics.api.appdynamics.com:443 (North America)
  - https://fra-ana-api.saas.appdynamics.com:443 (Europe)
  - https://syd-ana-api.saas.appdynamics.com:443 (APAC)
- For on-premises installations use whatever host and port you have configured. In clustered environments, this is often a load balancer.
The http.event.accountName and http.event.accessKey settings should be set to the name and the key of the account in the Controller UI with which the logs should be associated. By default, they are set to the built-in account for a single tenancy Controller.
The pipeline.poll.dir setting specifies where the log configuration files are located. This would not normally be changed unless you want to keep your files in a different location.
ad.controller.url should match your AppDynamics controller URL and port.

Troubleshoot Logs

If log capture is working correctly, logs should start appearing in the Log tab in the Analytics UI. It can take some time for logs to start accumulating. Note the following troubleshooting points:

If nothing appears in the log view, try searching over the past 24 hours.
Timezone discrepancies between the logs and the local machine can cause log entries to be incorrectly excluded based on the selected timeframe in the Controller UI. To remediate, try setting the log files and system time to UTC or logging the timezone with the log message to verify.
An inherent delay in indexing may result in the "last minute" view in the UI consistently yielding no logs. Increase the time range if you encounter this issue.

Troubleshoot Patterns

To help you troubleshoot the data extraction patterns in your job file, you can use the two debug REST endpoints in the Analytics Agent:

http://<Analytics_Agent_host>:<Analytics_Agent_http_port>/debug/grok: For testing grok patterns
http://<Analytics_Agent_host>:<Analytics_Agent_http_port>/debug/timestamp: For testing timestamp patterns

In the following examples, the Analytics Agent host is assumed to be localhost and the Analytics Agent port is assumed to be 9090. To configure the port on your Agent, use the property ad.dw.http.port in <Analytics_Agent_Home>/conf/analytics-agent.properties.

In the Analytics Agent version >=21.7 the debug/grok endpoint is disabled by default. To enable the endpoint, start the Analytics Agent with the property ad.debug.grok.endpoint.enabled=true in agent.properties file.

The Grok Endpoint

Click to expand...

The Grok tool works in two modes: extraction from a single line log and extraction from a multi-line log. To get a description of usage options:

curl -X GET http://localhost:9090/debug/grok

Single Line

In this mode, you pass in (as a POST request) a sample line from your log and the grok pattern you are testing, and you receive back the data you passed in organized as key/value pairs, where the keys are your identifiers.

curl -X POST http://localhost:9090/debug/grok --data-urlencode "logLine=LOG_LINE" --data-urlencode "pattern=PATTERN"

For example, the input:

curl -X POST http://localhost:9090/debug/grok --data-urlencode "logLine=[2014-09-04T15:22:41,594Z]  [INFO ]  [main]  [o.e.j.server.handler.ContextHandler]  Started i.d.j.MutableServletContextHandler@2b3b527{/,null,AVAILABLE}" --data-urlencode "pattern=\\[%{LOGLEVEL:logLevel}%{SPACE}\\]  \\[%{DATA:threadName}\\]  \\[%{JAVACLASS:class}\\]  %{GREEDYDATA}"

would produce this output:

{
 threadName => main
 logLevel => INFO
 class => o.e.j.server.handler.ContextHandler
}

The input:

curl -X POST http://localhost:9090/debug/grok --data-urlencode "logLine=2010-05-05,500.98,515.72,500.47,509.76,4566900,509.76" --data-urlencode "pattern=%{DATE:quoteDate},%{DATA:open},%{DATA:high},%{DATA:low},%{DATA:close},%{DATA:volume},%{GREEDYDATA:adjClose}"

would produce this output:

{
 open => 500.98
 adjClose => 509.76
 volume => 4566900
 quoteDate => 10-05-05
 high => 515.72
 low => 500.47
 close => 509.76
}

Multi-line

The multi-line version uses a file stored on the local filesystem as the source input.

curl -X POST http://localhost:9090/debug/grok --data-urlencode "logLine=`cat FILE_NAME`" --data-urlencode "pattern=PATTERN"

where FILE_NAME is the full path filename of the file that contains the multi-line log.

The Timestamp Endpoint

Click to expand...

The timestamp tool extracts the timestamp from a log line in Unix epoch time.

To get a description of usage options:

curl -X GET http://localhost:9090/debug/timestamp

In this mode, you pass in (as a POST request) a sample line from your log and the timestamp pattern you are testing, and you receive back the timestamp contained within the log line.

curl -X POST http://localhost:9090/debug/timestamp --data-urlencode "logLine=LOG_LINE" --data-urlencode "pattern=PATTERN"

For example, the input:

curl -X POST http://localhost:9090/debug/timestamp --data-urlencode "logLine=[2014-09-04T15:22:41,237Z]  [INFO ]  [main]  [io.dropwizard.server.ServerFactory]  Starting DemoMain" --data-urlencode "pattern=yyyy-MM-dd'T'HH:mm:ss,SSSZ"

would produce this output Unix epoch time:

{
 eventTimestamp => 1409844161237
}

The input:

curl -X POST http://localhost:9090/debug/timestamp --data-urlencode "logLine=Nov 17, 2014 8:21:51 AM com.foo.blitz.processor.core.hbase.coprocessor.endpoint.TimeRollupProcessEndpoint$HBaseDataFetcher callFoo1" --data-urlencode "pattern=MMM d, yyyy h:mm:ss aa"

would produce this output Unix epoch time:

{
 eventTimestamp => 1416212511000
}