unix regex
regex feature | BREs | EREs |
---|---|---|
dot, ^, $, [ ], [^ ] | ||
"any number" quantifier | * | * |
+ and ? quantifiers | + ? | |
range quantifier | \{min, max\} | {min, max} |
grouping | \( \) | ( ) |
can apply quantifiers to parentheses | ||
backreferences | \1 through \9 | |
alternation |
regex table
[cite:;taken from @nlp_jurafsky_2020 chapter 2 regular expressions, text normalization, edit distance]
regex | match | example pattern |
---|---|---|
[0-9] \d | a single digit | |
[A-Z] | an upper case letter | |
[a-z] | a lower case letter | |
^ | start of line | |
\$ | end of line | |
\b | word boundary | |
\B | non-word boundary | |
\D | any non-digit | |
\w | any alphanumeric/underscore | |
\W | a non-alphanumeric | |
\s | whitespace(space,tab) | |
\S | non-whitespace | |
* | zero or more occurrences of the previous char or expression | |
+ | one or more occurrences of the previous char or expression | |
? | exactly zero or one occurrence of the previous char or expression | |
{n} | ||
{n,m} | from | |
{n,} | at least | |
{,m} | up to | |
\* | an asterisk | |
\. | a period | |
\? | a question mark | |
\n | a newline | |
\t | a tab |