Related pages:

This page describes how to configure log sources using job files. If you are configuring new log sources, we recommend you use the Centralized Log Management UI to define source rules.

To configure log analytics using job files: 

  1. Describe the Log Source in a Job File
  2. Map the Log File Data to Analytics Fields
  3. Verify Analytics Agent Properties

Describe the Log Source in a Job File 

Each log source is represented by a job file. A job file is a configuration file that specifies the following:

To define a source, you create a job file (or modify one of the samples) in the Analytics Agent configuration directory. The Analytics Agent includes sample job files for Glassfish, OSX log, and others. The Analytics Agent can also collect and parse GZIP files - (log files ending in .gz).

The job files are located in the following directory:

The agent reads the job files in the directory dynamically, so you can add job files in the directory without restarting the agent.   

To configure a job file, use these configurable settings in the file: 

Map the Log File Data to Analytics Fields

To specify how data in the unstructured log records should be mapped to structured analytics fields for log analytics, you provide the configuration in the job file. You can map unstructured log data in the following ways:

Specify Grok Expressions 

Grok is a way to define and use complex, nested regular expressions in an easy to read and use format. Regular expressions defining discrete elements in a log file are mapped to grok-pattern names, which can also be used to create more complex patterns.

Grok-pattern names for many of the common types of data found in logs are provided for you with the analytics agent. A list of basic grok-pattern names and their underlying structures are bundled inside lib/analytics-shared-pipeline-core.jar. You can list the grok files with a command such as the following:

unzip -l ./lib/analytics-shared-pipeline-core.jar|grep "\.grok"

To can view the definition of a grok file with a command such as the following:

unzip -p ./lib/analytics-shared-pipeline-core.jar grok/grok-patterns.grok

The grok directory also contains samples of more complex definitions customized for various common log types - java.grok, mongodb.grok, and so on.

Once the grok-pattern names are created, they are then associated in the jobs file with field identifiers that become the analytics keys.

The basic building block is %{grok-pattern name:identifier}, where grok-pattern name is the grok pattern that knows about the type of data in the log you want to fetch (based on a regex definition) and identifier is your identifier for the kind of data, which becomes the analytics key.  So %{IP:client} would select an IP address in the log record and map it to the key client.

Custom Grok Patterns

Complex grok patterns can be created using nested basic patterns.  For example, from the mongodb.grok file:

MONGO_LOG %{SYSLOGTIMESTAMP:timestamp} \[%{WORD:component}\] %{GREEDYDATA}

It is also possible to create entirely new patterns using regular expressions.  For example, the following line from java.grok defines a grok pattern named JAVACLASS.

JAVACLASS (?:[a-zA-Z$_][a-zA-Z$_0-9]*\.)*[a-zA-Z$_][a-zA-Z$_0-9]*

Because JAVACLASS is defined in a .grok file in the grok directory can be used as if it were a basic grok pattern.  In a jobs file, you can use the JAVACLASS pattern match as follows:

grok:
  pattern: ".... \[%{JAVACLASS:class}\\]

In this case, the field name as it appears in the Analytics UI would be "class". For a full example, see the following files: 

Special Considerations for Backslashes

The job file is in YAML format, which treats the backslash as an escape character. Therefore, to include a literal backslash in the String pattern you need to escape the backslash with a second backslash. You can avoid the need to escape backslashes in the .job file grok pattern, by enclosing the grok pattern in single quotes instead of double quotes such as the following:

grok:
  patterns:
    - '\[%{DATESTAMP:TIME}%{SPACE}CET\]%{SPACE}%{NOTSPACE:appId}%{SPACE}%{NOTSPACE:appName}%{SPACE}%{NOTSPACE:Severity}%{SPACE}%{NOTSPACE:messageId}:%{SPACE}%{GREEDYDATA}'

Numeric Fields

In Release 4.1.3, the grok definition syntax was enhanced to support three basic data types. When defining a pattern in the .grok file you can specify the data type as number, boolean, or string. If a Grok alias uses that grok definition in a .job file then the extracted field is stored as a number or boolean. Strings are the default. If the number or boolean conversion fails, then a log message appears in the agent's log file. No validations are performed as it is not possible to reverse engineer a regex reliably. These are pure runtime extractions and conversions.

Upgrade pre-4.1.3 Job Files

For 4.1.2 (or older) .job files in use that have fields that are unspecified or specified as NUMBER and now switch to the "type aware" files, the data inside Events Service will break. This is due to the type mapping. To avoid this, you need to modify the grok alias in your job files. 
Examples:

Was:
grok:
  patterns:
    - "%{DATE:happenedAt},%{NUMBER:quantity}

Update job to:
grok:
  patterns:
    - "%{DATE:happenedAt},%{NUMBER:quantity_new}
Was:
grok:
  patterns:
    - "%{DATE:happenedAt},%{DATA:howMany}

Update job to:
grok:
  patterns:
    - "%{DATE:happenedAt},%{POSINT:howManyInt}

To Upgrade (Migrate) < 4.1.3 Job Files: 

  1. Stop analytics-agent.
  2. Change .job files that use the enhanced grok patterns:

    BOOL:boolean
    INT:number
    BASE10NUM:number
    NUMBER:number
    POSINT:number
    NONNEGINT:number


    Change the grok alias so as not to conflict with the older aliases:

    grok:
      patterns:
    (Old)  - "%{DATE:quoteDate},%{NUMBER:open},%{NUMBER:high},%{NUMBER:low},%{NUMBER:close},%{NUMBER:volume},%{NUMBER:adjClose}"
    
    (New aliases)  - "%{DATE:quoteDate},%{NUMBER:openNum},%{NUMBER:highNum},%{NUMBER:lowNum},%{NUMBER:closeNum},%{NUMBER:volumeNum},%{NUMBER:adjCloseNum}"
  3. Start analytics-agent.

Specifying Key-Value Pairs

This section of the mapping configuration captures key-value pairs from fields specified by the source parameter. The values listed under source should refer to fields that were defined and captured by a grok pattern. For example, if you have a grok parameter that defines the following pattern "%{DATA:keyValuePairs}" then you can list the field "keyValuePairs" under the source parameter to capture any key-value pairs listed in the "keyValuePairs" string. If the source parameter is not specified then the agent attempts to extract key-value pairs from the entire log message. The result can be different than expected if the message contains more information than just key-value pairs. 

The Key Value mapping contains these fields:

The sample-glassfish-log.job file includes key-value pairs configuration. This file is found here: <analytics_agent_home>/conf/job/.

Key-Value Pairs Example 

For a log file with the following entries:

[#I2015-09-24T06:33:31:574-0700|INFO|glassfish3.1.2|com.appdynamics,METRICS.WRITE|_ThreadID=200;_ThreadName=Thread-2;|NODE PURGER Complete in 14 ms|#]

[#I2015-09-24T06:33:46:541-0700|INFO|glassfish3.1.2|com.singularity.ee.controller.beans.license.LicenseUsageManagerBeanWRITE|_ThreadID=202;_ThreadName=Thread-2;|about to start persisting license usage data |#]

And the following grok pattern:

grok:
  patterns:
    - "\\[\\#\\|%{DATA}\\|%{LOGLEVEL:logLevel}\\|%{DATA:serverVersion}\\|%{JAVACLASS:class}\\|%{DATA:keyValuePairs}\\|%{GREEDYDATA}"

The key-value parameter to extract the ThreadID and the ThreadName should look similar to the following: 

keyValue:
  source:
	-"keyValuePairs"
  split: "="
  separator: ";"
  include:
	- "ThreadID"
	- "ThreadName"	
  trim:
	- "_"

Specify Transform Parameters

This section of the mapping configuration enables you to change the type or alias name of any field previously extracted from the logs by your grok or key value configuration. The transform is applied after all fields have been captured from the log message. You can specify a list of field names, where you want to cast the value to a specific type or rename the field with an "alias".


The Transform mapping contains these fields:

Verify Analytics Agent Properties

In addition to configuring the log source in the job file, you should verify the values in the analytics-agent.properties file found in the conf directory. Confirm these property values:

Troubleshoot Logs

If log capture is working correctly, logs should start appearing in the Log tab in the Analytics UI. It can take some time for logs to start accumulating. Note the following troubleshooting points:  

Troubleshoot Patterns

To help you troubleshoot the data extraction patterns in your job file, you can use the two debug REST endpoints in the Analytics Agent:

In the following examples, the Analytics Agent host is assumed to be localhost and the Analytics Agent port is assumed to be 9090. To configure the port on your Agent, use the property ad.dw.http.port in <Analytics_Agent_Home>/conf/analytics-agent.properties.

In the Analytics Agent version >=21.7 the debug/grok endpoint is disabled by default. To enable the endpoint, start the Analytics Agent with the property ad.debug.grok.endpoint.enabled=true in agent.properties file.

The Grok Endpoint

The Grok tool works in two modes: extraction from a single line log and extraction from a multi-line log.  To get a description of usage options:

curl -X GET http://localhost:9090/debug/grok 

Single Line

In this mode, you pass in (as a POST request) a sample line from your log and the grok pattern you are testing, and you receive back the data you passed in organized as key/value pairs, where the keys are your identifiers.

curl -X POST http://localhost:9090/debug/grok --data-urlencode "logLine=LOG_LINE" --data-urlencode "pattern=PATTERN"

 For example, the input:

curl -X POST http://localhost:9090/debug/grok --data-urlencode "logLine=[2014-09-04T15:22:41,594Z]  [INFO ]  [main]  [o.e.j.server.handler.ContextHandler]  Started i.d.j.MutableServletContextHandler@2b3b527{/,null,AVAILABLE}" --data-urlencode "pattern=\\[%{LOGLEVEL:logLevel}%{SPACE}\\]  \\[%{DATA:threadName}\\]  \\[%{JAVACLASS:class}\\]  %{GREEDYDATA}"

would produce this output:

{
 threadName => main
 logLevel => INFO
 class => o.e.j.server.handler.ContextHandler
}

The input:

curl -X POST http://localhost:9090/debug/grok --data-urlencode "logLine=2010-05-05,500.98,515.72,500.47,509.76,4566900,509.76" --data-urlencode "pattern=%{DATE:quoteDate},%{DATA:open},%{DATA:high},%{DATA:low},%{DATA:close},%{DATA:volume},%{GREEDYDATA:adjClose}"

would produce this output:

{
 open => 500.98
 adjClose => 509.76
 volume => 4566900
 quoteDate => 10-05-05
 high => 515.72
 low => 500.47
 close => 509.76
}

Multi-line

The multi-line version uses a file stored on the local filesystem as the source input.

curl -X POST http://localhost:9090/debug/grok --data-urlencode "logLine=`cat FILE_NAME`" --data-urlencode "pattern=PATTERN"

where FILE_NAME is the full path filename of the file that contains the multi-line log.

The Timestamp Endpoint

The timestamp tool extracts the timestamp from a log line in Unix epoch time.

To get a description of usage options: 

curl -X GET http://localhost:9090/debug/timestamp

In this mode, you pass in (as a POST request) a sample line from your log and the timestamp pattern you are testing, and you receive back the timestamp contained within the log line.

curl -X POST http://localhost:9090/debug/timestamp --data-urlencode "logLine=LOG_LINE" --data-urlencode "pattern=PATTERN"

For example, the input:

curl -X POST http://localhost:9090/debug/timestamp --data-urlencode "logLine=[2014-09-04T15:22:41,237Z]  [INFO ]  [main]  [io.dropwizard.server.ServerFactory]  Starting DemoMain" --data-urlencode "pattern=yyyy-MM-dd'T'HH:mm:ss,SSSZ"

would produce this output Unix epoch time:

{
 eventTimestamp => 1409844161237
}

The input:

curl -X POST http://localhost:9090/debug/timestamp --data-urlencode "logLine=Nov 17, 2014 8:21:51 AM com.foo.blitz.processor.core.hbase.coprocessor.endpoint.TimeRollupProcessEndpoint$HBaseDataFetcher callFoo1" --data-urlencode "pattern=MMM d, yyyy h:mm:ss aa"

would produce this output Unix epoch time:

{
 eventTimestamp => 1416212511000
}