Regex Syntax

Regular expressions in KQL are used by operators and functions such as matches regex, parse, and replace_regex().

Regular expressions must be encoded as string literals and follow the string quoting rules. For example, the regular expression \A is represented in KQL as "\\A". The extra backslash indicates that the other backslash is part of the regular expression \A.

Match one character

Pattern	Description
`.`	Any character except newline (includes newline with `s` flag)
`[0-9]`	Any ASCII digit
`[^0-9]`	Any character that isn't an ASCII digit
`\d`	Digit (`\p{Nd}`)
`\D`	Not a digit
`\pX`	Unicode character class identified by a one-letter name
`\p{Greek}`	Unicode character class (general category or script)
`\PX`	Negated Unicode character class identified by a one-letter name
`\P{Greek}`	Negated Unicode character class (general category or script)

Character classes

Pattern	Description
`[xyz]`	Matching either x, y or z (union)
`[^xyz]`	Matching any character except x, y, and z
`[a-z]`	Matching any character in range a-z
`[[:alpha:]]`	ASCII character class (`[A-Za-z]`)
`[[:^alpha:]]`	Negated ASCII character class (`[^A-Za-z]`)
`[x[^xyz]]`	Nested/grouping class (matching any character except y and z)
`[a-y&&xyz]`	Intersection (matching x or y)
`[0-9&&[^4]]`	Subtraction using intersection and negation (matching 0-9 except 4)
`[0-9--4]`	Direct subtraction (matching 0-9 except 4)
`[a-g~~b-h]`	Symmetric difference (matching a and h only)
`[\[\]]`	Escape in character classes (matching `[` or `]`)

Any named character class may appear inside a bracketed [...] character class. For example, [\p{Greek}[:digit:]] matches any ASCII digit or any codepoint in the Greek script.

Precedence (most binding to least binding):

Ranges: [a-cd] == [[a-c]d]
Union: [ab&&bc] == [[ab]&&[bc]]
Intersection, difference, symmetric difference: equal precedence, evaluated left-to-right
Negation: [^a-z&&b] == [^[a-z&&b]]

Composites

Pattern	Description
`xy`	Concatenation (x followed by y)
`x\|y`	Alternation (x or y, prefer x)

Repetitions

Pattern	Description
`x*`	Zero or more of x (greedy)
`x+`	One or more of x (greedy)
`x?`	Zero or one of x (greedy)
`x*?`	Zero or more of x (ungreedy/lazy)
`x+?`	One or more of x (ungreedy/lazy)
`x??`	Zero or one of x (ungreedy/lazy)
`x{n,m}`	At least n x and at most m x (greedy)
`x{n,}`	At least n x (greedy)
`x{n}`	Exactly n x
`x{n,m}?`	At least n x and at most m x (ungreedy/lazy)
`x{n,}?`	At least n x (ungreedy/lazy)

Anchors

Pattern	Description
`^`	Beginning of haystack, or start-of-line with multi-line mode
`$`	End of haystack, or end-of-line with multi-line mode
`\A`	Only the beginning of a haystack (even with multi-line mode)
`\z`	Only the end of a haystack (even with multi-line mode)
`\b`	Unicode word boundary (`\w` on one side and `\W`, `\A`, or `\z` on other)
`\B`	Not a Unicode word boundary

Grouping and flags

Pattern	Description
`(exp)`	Numbered capture group (indexed by opening parenthesis)
`(?P<name>exp)`	Named capture group
`(?<name>exp)`	Named capture group
`(?:exp)`	Non-capturing group
`(?flags)`	Set flags within current group
`(?flags:exp)`	Set flags for exp (non-capturing)

Flags

Flag	Description
`i`	Case-insensitive: letters match both upper and lower case
`m`	Multi-line mode: `^` and `$` match begin/end of line
`s`	Allow `.` to match `\n`
`R`	CRLF mode: when multi-line mode is enabled, `\r\n` is used
`U`	Swap the meaning of `x` and `x?`
`u`	Unicode support (enabled by default)
`x`	Verbose mode, ignores whitespace and allows line comments starting with `#`

Flags can be toggled within a pattern. For example, (?i)a+(?-i)b+ uses a case-insensitive match for a+ and a case-sensitive match for b+.

Escape sequences

Pattern	Description
`\*`	Literal `*` (applies to all ASCII except `[0-9A-Za-z<>]`)
`\a`	Bell (`\x07`)
`\f`	Form feed (`\x0C`)
`\t`	Horizontal tab
`\n`	New line
`\r`	Carriage return
`\v`	Vertical tab (`\x0B`)
`\123`	Octal character code, up to three digits
`\x7F`	Hex character code (exactly two digits)
`\x{10FFFF}`	Hex character code (Unicode code point)
`\u007F`	Hex character code (exactly four digits)

Perl character classes (Unicode)

Based on UTS#18:

Pattern	Description
`\d`	Digit (`\p{Nd}`)
`\D`	Not digit
`\s`	Whitespace (`\p{White_Space}`)
`\S`	Not whitespace
`\w`	Word character (`\p{Alphabetic}` + `\p{M}` + `\d` + `\p{Pc}` + `\p{Join_Control}`)
`\W`	Not word character

ASCII character classes

Pattern	Description
`[[:alnum:]]`	Alphanumeric (`[0-9A-Za-z]`)
`[[:alpha:]]`	Alphabetic (`[A-Za-z]`)
`[[:ascii:]]`	ASCII (`[\x00-\x7F]`)
`[[:blank:]]`	Blank (`[\t ]`)
`[[:cntrl:]]`	Control (`[\x00-\x1F\x7F]`)
`[[:digit:]]`	Digits (`[0-9]`)
`[[:graph:]]`	Graphical (`[!-~]`)
`[[:lower:]]`	Lower case (`[a-z]`)
`[[:print:]]`	Printable (`[ -~]`)
`[[:punct:]]`	Punctuation ([!-/:-@[-`{-~])
`[[:space:]]`	Whitespace (`[\t\n\v\f\r ]`)
`[[:upper:]]`	Upper case (`[A-Z]`)
`[[:word:]]`	Word characters (`[0-9A-Za-z_]`)
`[[:xdigit:]]`	Hex digit (`[0-9A-Fa-f]`)

Performance tips

Unicode affects memory and speed: Unicode character classes like \w match ~140,000 codepoints. If ASCII suffices, use [0-9A-Za-z_] or (?-u:\w) instead.
Word boundaries: If you don't need Unicode-aware word boundaries, (?-u:\b) is faster than \b.
Literals accelerate searches: Including literal characters in your pattern helps the regex engine optimize. For example, in \w+@\w+, the @ is matched first, then a reverse match finds the start.

On this page