Overview

Note: This is an advanced feature and requires knowledge of regular expression patterns.

The regular expression operator, REGEXP, can be used in the WHERE clause to handle complex matching queries. This operator applies a regular expression pattern match of a string to the pattern passed as an argument.  ADQL uses the Lucene regular expression engine to analyze the REGEXP expression. The search results for queries that use the REGEXP operator are dependent not only on regular expression syntax and rules but also on whether you are searching analyzed or non-analyzed fields. To use REGEXP on analyzed fields, be sure to read about data indexing and searching across tokens in this topic: Analyzed Fields.

The REGEXP operator can only match with the lowercased tokens in the analyzed fields. To search for an uppercase string in an analyzed field using a wild card or REGEXP, you need to input a lowercase string. For example info, not INFO.

Allowed Characters

Any Unicode character may be used in the regular expression pattern, but certain characters are reserved and must be escaped. Any reserved character can be escaped with a backslash "\*" including a literal backslash character:"\\".

The standard reserved characters are the following:

. ? + * | { } [ ] ( ) " \

For query performance reasons, before using any reserved character in a REGEXP pattern, you need to specify the first three characters of the string explicitly, such as specifying 123 before the brackets [ 0-9 ] in the following example:

SELECT * FROM logs WHERE sourceType='yourLogFileAND id REGEXP '123 [ 0-9 ]'

A query such as the following is invalid.

SELECT * FROM logs WHERE sourceType='yourLogFileAND id REGEXP '[ 0-9 ]'

Example Operations, Strings, and Associated Patterns

Supported OperationDescriptionStringPattern
Match any characterA period "." can be used to represent any character.abcdeabc..
One or more

A plus sign "+" can be used to repeat the preceding shortest pattern one or more times.

aaabb

aaab+
Zero or more

An asterisk "*" can be used to match the preceding shortest pattern zero or more times.

aaaabbbcc

aaaab*c*
Zero or oneA question mark "?" makes the preceding shortest pattern optional. It matches zero or one times.

aaabbbc

aaa?b+c?
Min-to-max

Curly brackets "{}" can be used to specify a minimum and (optionally) a maximum number of times the preceding shortest pattern can repeat. Allowed forms:

{5} # repeat exactly 5 times

{2,5} # repeat at least twice and at most 5 times

aaaabbbcc

aaaab{3}c{2}
aaaab{2,4}c{2,4}

GroupingParentheses "()" can be used to form sub-patterns.

abababab

abab(ab)*

Alternation 

The pipe symbol "|" acts as an OR operator. The match will succeed if the pattern on either the left-hand side OR the right-hand side matches.

aaabbb

aaa(ccc|bbb)