Regular Expressions

Regex

Fix a finite alphabet $\Sigma$ of characters. A regular expression over $\Sigma$ (equivalently: regex), is a tree described by the following grammar:

\[R, S ::= \emptyset \mid \epsilon \mid a \mid R \cdot S \mid R + S \mid R^*\]

For the purpose of understanding the expressions generated by this grammar as trees, we consider each letter $a \in \Sigma$ to have arity 0 (i.e. it will be a leaf of the tree), $\cdot$ and $+$ are binary operators (arity 2), and $\ ^*$ is an operator of arity 1 that we happen to write postfix (i.e. after it’s argument).

We will use $R$, $S$, $T$, $U$ and other uppercase letters towards the end of the alphabet as variables that stand for regular expressions.