\[ \newcommand{\tr}{\Rightarrow} \newcommand{\trs}{\tr^{\!\ast}} \newcommand{\rlnm}[1]{\mathsf{(#1)}} \newcommand{\rred}[1]{\xrightarrow{#1}} \newcommand{\rreds}[1]{\mathrel{\xrightarrow{#1}\!\!^*}} \newcommand{\cl}{\mathsf{Cl}} \newcommand{\pow}{\mathcal{P}} \newcommand{\matches}{\mathrel{\mathsf{matches}}} \newcommand{\kw}[1]{\mathsf{#1}} \]

Regex

Fix a finite alphabet \(\Sigma\) of characters. A regular expression over $\Sigma$ (equivalently: regex), is a tree described by the following grammar:

\[R, S ::= \emptyset \mid \epsilon \mid a \mid R \cdot S \mid R + S \mid R^*\]

For the purpose of understanding the expressions generated by this grammar as trees, we consider each letter \(a \in \Sigma\) to have arity 0 (i.e. it will be a leaf of the tree), \(\cdot\) and \(+\) are binary operators (arity 2), and \(\ ^*\) is an operator of arity 1 that we happen to write postfix (i.e. after it’s argument).

We will use $R$, $S$, $T$, $U$ and other uppercase letters towards the end of the alphabet as variables that stand for regular expressions.