Regex Syntax
Regular expression syntax supported in KQL
Regular expressions in KQL are used by operators and functions such as matches regex, parse, and replace_regex().
Regular expressions must be encoded as string literals and follow the string quoting rules. For example, the regular expression \A is represented in KQL as "\\A". The extra backslash indicates that the other backslash is part of the regular expression \A.
Match one character
| Pattern | Description |
|---|---|
. | Any character except newline (includes newline with s flag) |
[0-9] | Any ASCII digit |
[^0-9] | Any character that isn't an ASCII digit |
\d | Digit (\p{Nd}) |
\D | Not a digit |
\pX | Unicode character class identified by a one-letter name |
\p{Greek} | Unicode character class (general category or script) |
\PX | Negated Unicode character class identified by a one-letter name |
\P{Greek} | Negated Unicode character class (general category or script) |
Character classes
| Pattern | Description |
|---|---|
[xyz] | Matching either x, y or z (union) |
[^xyz] | Matching any character except x, y, and z |
[a-z] | Matching any character in range a-z |
[[:alpha:]] | ASCII character class ([A-Za-z]) |
[[:^alpha:]] | Negated ASCII character class ([^A-Za-z]) |
[x[^xyz]] | Nested/grouping class (matching any character except y and z) |
[a-y&&xyz] | Intersection (matching x or y) |
[0-9&&[^4]] | Subtraction using intersection and negation (matching 0-9 except 4) |
[0-9--4] | Direct subtraction (matching 0-9 except 4) |
[a-g~~b-h] | Symmetric difference (matching a and h only) |
[\[\]] | Escape in character classes (matching [ or ]) |
Any named character class may appear inside a bracketed [...] character class.
For example, [\p{Greek}[:digit:]] matches any ASCII digit or any codepoint in the Greek script.
Precedence (most binding to least binding):
- Ranges:
[a-cd]==[[a-c]d] - Union:
[ab&&bc]==[[ab]&&[bc]] - Intersection, difference, symmetric difference: equal precedence, evaluated left-to-right
- Negation:
[^a-z&&b]==[^[a-z&&b]]
Composites
| Pattern | Description |
|---|---|
xy | Concatenation (x followed by y) |
x|y | Alternation (x or y, prefer x) |
Repetitions
| Pattern | Description |
|---|---|
x* | Zero or more of x (greedy) |
x+ | One or more of x (greedy) |
x? | Zero or one of x (greedy) |
x*? | Zero or more of x (ungreedy/lazy) |
x+? | One or more of x (ungreedy/lazy) |
x?? | Zero or one of x (ungreedy/lazy) |
x{n,m} | At least n x and at most m x (greedy) |
x{n,} | At least n x (greedy) |
x{n} | Exactly n x |
x{n,m}? | At least n x and at most m x (ungreedy/lazy) |
x{n,}? | At least n x (ungreedy/lazy) |
Anchors
| Pattern | Description |
|---|---|
^ | Beginning of haystack, or start-of-line with multi-line mode |
$ | End of haystack, or end-of-line with multi-line mode |
\A | Only the beginning of a haystack (even with multi-line mode) |
\z | Only the end of a haystack (even with multi-line mode) |
\b | Unicode word boundary (\w on one side and \W, \A, or \z on other) |
\B | Not a Unicode word boundary |
Grouping and flags
| Pattern | Description |
|---|---|
(exp) | Numbered capture group (indexed by opening parenthesis) |
(?P<name>exp) | Named capture group |
(?<name>exp) | Named capture group |
(?:exp) | Non-capturing group |
(?flags) | Set flags within current group |
(?flags:exp) | Set flags for exp (non-capturing) |
Flags
| Flag | Description |
|---|---|
i | Case-insensitive: letters match both upper and lower case |
m | Multi-line mode: ^ and $ match begin/end of line |
s | Allow . to match \n |
R | CRLF mode: when multi-line mode is enabled, \r\n is used |
U | Swap the meaning of x* and x*? |
u | Unicode support (enabled by default) |
x | Verbose mode, ignores whitespace and allows line comments starting with # |
Flags can be toggled within a pattern.
For example, (?i)a+(?-i)b+ uses a case-insensitive match for a+ and a case-sensitive match for b+.
Case-insensitive matching and ſ
A (?i) regex containing s will treat the historical character ſ (Latin small letter long s, U+017F) as case-equivalent to s. This is correct per the Unicode case-folding spec, but our indexing treats s and ſ as distinct characters — so a query like (?i)secret may miss rows whose only match is through ſ.
Long-s essentially only appears in historical or archival text, so most queries are unaffected. If you need exact matching across both forms, enumerate the cases explicitly instead of using (?i):
where * matches regex "[sſ]ecret"No other letter is affected.
Escape sequences
| Pattern | Description |
|---|---|
\* | Literal * (applies to all ASCII except [0-9A-Za-z<>]) |
\a | Bell (\x07) |
\f | Form feed (\x0C) |
\t | Horizontal tab |
\n | New line |
\r | Carriage return |
\v | Vertical tab (\x0B) |
\123 | Octal character code, up to three digits |
\x7F | Hex character code (exactly two digits) |
\x{10FFFF} | Hex character code (Unicode code point) |
\u007F | Hex character code (exactly four digits) |
Perl character classes (Unicode)
Based on UTS#18:
| Pattern | Description |
|---|---|
\d | Digit (\p{Nd}) |
\D | Not digit |
\s | Whitespace (\p{White_Space}) |
\S | Not whitespace |
\w | Word character (\p{Alphabetic} + \p{M} + \d + \p{Pc} + \p{Join_Control}) |
\W | Not word character |
ASCII character classes
| Pattern | Description |
|---|---|
[[:alnum:]] | Alphanumeric ([0-9A-Za-z]) |
[[:alpha:]] | Alphabetic ([A-Za-z]) |
[[:ascii:]] | ASCII ([\x00-\x7F]) |
[[:blank:]] | Blank ([\t ]) |
[[:cntrl:]] | Control ([\x00-\x1F\x7F]) |
[[:digit:]] | Digits ([0-9]) |
[[:graph:]] | Graphical ([!-~]) |
[[:lower:]] | Lower case ([a-z]) |
[[:print:]] | Printable ([ -~]) |
[[:punct:]] | Punctuation ([!-/:-@[-`{-~]) |
[[:space:]] | Whitespace ([\t\n\v\f\r ]) |
[[:upper:]] | Upper case ([A-Z]) |
[[:word:]] | Word characters ([0-9A-Za-z_]) |
[[:xdigit:]] | Hex digit ([0-9A-Fa-f]) |
Performance tips
- Unicode affects memory and speed: Unicode character classes like
\wmatch ~140,000 codepoints. If ASCII suffices, use[0-9A-Za-z_]or(?-u:\w)instead. - Word boundaries: If you don't need Unicode-aware word boundaries,
(?-u:\b)is faster than\b. - Literals accelerate searches: Including literal characters in your pattern helps the regex engine optimize. For example, in
\w+@\w+, the@is matched first, then a reverse match finds the start.