This is an advanced feature and requires knowledge of regular expression patterns.
The regular expression operator, REGEXP, can be used in the WHERE clause to handle complex matching queries. This operator applies a regular expression pattern match of a string to the pattern passed as an argument. ADQL uses the Lucene regular expression engine to analyze the REGEXP expression. The search results for queries that use the REGEXP operator are dependent not only on regular expression syntax and rules but also on whether you are searching analyzed or non-analyzed fields. To use REGEXP on analyzed fields, see Analyzed Fields.
The REGEXP operator can only match with the lowercased tokens in the analyzed fields. To search for an uppercase string in an analyzed field using a wild card or REGEXP, you need to input a lowercase string. For example info, not INFO.
Allowed Characters
Any Unicode character may be used in the regular expression pattern, but certain characters are reserved and must be escaped. Any reserved character can be escaped with a backslash "\*"
including a literal backslash character:"\\".
The standard reserved characters are:
For query performance reasons, before using any reserved character in a REGEXP pattern, you need to specify the first three characters of the string explicitly, such as specifying 123 before the brackets [ 0-9 ] in the following example:
SELECT * FROM logs WHERE sourceType='yourLogFile' AND id REGEXP '123 [ 0-9 ]'
A query such as the following is invalid.
SELECT * FROM logs WHERE sourceType='yourLogFile' AND id REGEXP '[ 0-9 ]'
Example Operations, Strings, and Associated Patterns
Supported Operation | Description | String | Pattern |
---|
Match any character | A period "." can be used to represent any character. | abcde | abc.. |
One or more | A plus sign "+" can be used to repeat the preceding shortest pattern one or more times. | aaabb | aaab+ |
Zero or more | An asterisk "*" can be used to match the preceding shortest pattern zero or more times. | aaaabbbcc | aaaab*c* |
Zero or one | A question mark "?" makes the preceding shortest pattern optional. It matches zero or one times. | aaabbbc | aaa?b+c? |
Min-to-max | Curly brackets "{}" can be used to specify a minimum and (optionally) a maximum number of times the preceding shortest pattern can repeat. Allowed forms: {5} # repeat exactly 5 times {2,5} # repeat at least twice and at most 5 times | aaaabbbcc | aaaab{3}c{2} aaaab{2,4}c{2,4} |
Grouping | Parentheses "()" can be used to form sub-patterns. | abababab | abab(ab)* |
Alternation | The pipe symbol "|" acts as an OR operator. The match will succeed if the pattern on either the left-hand side OR the right-hand side matches. | aaabbb | aaa(ccc|bbb) |