Denotational Semantics

In the semantics part of this unit, we will be looking at how programs in a toy imperative programming language known as the “While language” can be given a formal mathematical meanings (i.e. a semantic) and how this can be used to reason about programs. Although this language is very simple, and doesn’t have all the features you would expect of a general-purpose programming language, the fundamental aspects of programming language semantics are still present.

This week, we will look at the semantics of its arithmetic and Boolean expressions using denotational semantics. Denotational semantics is a way of formally giving meaning to programs by relating them to mathematical objects. In doing so, we can translate our knowledge of these mathematical objects back onto the programs we are studying.

Simple Arithmetic Expressions

To get a flavour of denotational semantics, we will first look at a semantics of a simple subset of arithmetic expressions without variables.

A simple arithmetic expression is an element of the following grammar:

\[A^- \Coloneqq \underline{n} \mid A^- \mathbin{\underline{+}} A^- \mid A^- \mathbin{\underline{-}} A^- \mid A^- \mathbin{\underline{*}} A^-\]

where $\underline{n}$ stands for any integer literals (used as terminals).

When it comes to talking about the semantics of a language, we are no longer interested in its exact textural representation as a string of characters, complete with whitespace, comments, and punctuation; but instead, we work with its abstract syntax trees. Recall that an abstract syntax tree is a tree representation of a statement or an expression that captures its essential structure - each node in the tree represents a syntactic construction; in this case, integer literals, additional, etc.

By an “element of” a given grammar, therefore, we actually mean an abstract syntax tree for some string within the language of that grammar. Although the structure of ASTs don’t always have a one to one correspondence with the production rules of a grammar, in our case each production rule will be considered a different node type. There is also a degree of abstraction in that we have integer literals as terminals - we assume their textual representation has already been interpreted as an integer. It is common when defining semantics to make these kinds of assumptions, well, the interpretation of integer literals isn’t that interesting!

We will write $\mathcal{A}^-$ to represent the set of all simple arithmetic expressions (a.k.a. abstract syntax trees generated by the aforementioned grammar).

Syntax vs. Semantics

Take note of the fact that we are underlining numeric literals and the binary operators of this language. This is not a stylish whim but to emphasise that these are purely syntactic terminals and not subject to their usual mathematical definitions. For example, the expressions $\underline{1} \mathbin{\underline{+}} \underline{1}$ and $\underline{2}$ are not the same expression (even if they intuitively “mean” the same thing) as they represent different abstract syntax trees.

Denotation Function

The denotational semantics for simple arithmetic expressions associates each expression with an integer values - the mathematical objects that we will use to ascribe meaning to expressions. This takes the form of a denotation function $\llbracket \cdot \rrbracket_{\mathcal{A}^-} : \mathcal{A}^- \rightarrow \mathbb{Z}$ that maps simple arithmetic expressions to the integers. For the application of this function, we will use the notation $\llbracket e \rrbracket_{\mathcal{A}^-} \in \mathbb{Z}$ where $e \in \mathcal{A}^-$ is some expression.

The denotation function for simple arithmetic expressions $\llbracket \cdot \rrbracket_{\mathcal{A}^-}$ is defined by the following set of equations:

\[\begin{array}{rl} \llbracket \underline{n} \rrbracket_{\mathcal{}^-} & = n \\ \llbracket e_1 \mathbin{\underline{+}} e_2 \rrbracket_{\mathcal{A}^-} & = \llbracket e_1 \rrbracket_{\mathcal{A}^-} + \llbracket e_2 \rrbracket_{\mathcal{A}^-} \\ \llbracket e_1 \mathbin{\underline{-}} e_2 \rrbracket_{\mathcal{A}^-} & = \llbracket e_1 \rrbracket_{\mathcal{A}^-} - \llbracket e_2 \rrbracket_{\mathcal{A}^-} \\ \llbracket e_1 \mathbin{\underline{*}} e_2 \rrbracket_{\mathcal{A}^-} & = \llbracket e_1 \rrbracket_{\mathcal{A}^-} \cdot \llbracket e_2 \rrbracket_{\mathcal{A}^-} \end{array}\]

To be more precise, the denotation function is defined by recursion over the structure of expressions. That means that it uses the denotation of sub-expressions to build up a denotation for a compound expression. For example, the denotation of an expression $e_1 \mathbin{\underline{+}} e_2$ is calculated based on the denotation of the sub-expressions $e_1$ and $e_2$. Sometimes this is referred to as a compositional semantics as the semantics for an expression is defined by composing the semantics of its operators. Giving definitions by structural recursion in this manner is very common in semantics as it ensure that the function is well-defined. That is, every expression has a unique denotation according to our equations.

The definition can be seen as interpreting (i.e. giving meaning to) underlined syntactic constructs as their usual mathematical definitions. For example, by applying these equations, we can see that the denotation $\llbracket \underline{1} \mathbin{\underline{*}} (\underline{2} \mathbin{\underline{+}} \underline{3}) \rrbracket_{\mathcal{A}^{-}}$ equates to $5$. Notice that the equations are applied recursively in that the denotation of the expression $\underline{1} \mathbin{\underline{*}} (\underline{2} \mathbin{\underline{-}} \underline{3})$ is calculated based on the denotation of the two sub-expressions $\underline{1}$ and $\underline{2} \mathbin{\underline{+}} \underline{3}$.

This may all seem a bit pedantic, and like a lot of unnecessary work, but this simple example highlights the essential ingredients of denotational semantics:

A collection of mathematical objects that we will use to model programs (e.g. the integers);
And a recursive function mapping each syntactic constructs to operations over these objects.

We’re now ready to define a denotational semantics for a more interesting language.

Adding Variables

The next logical step is to add variables to our expression language. Therefore, we will consider the following grammar of arithmetic expressions:

An arithmetic expression is an element of the following grammar:

\[A \Coloneqq x \mid n \mid A + A \mid A - A \mid A * A\]

where the new construct $x$ represents a set of program variables $\mathsf{Var}$ treated as terminals.

As before, we will write $\mathcal{A}$ to denote the set of arithmetic expressions (a.k.a abstract syntax trees generated by this grammar).

To give a satisfactory meaning to these expressions, we clearly must take into account the value of the variables. Consequently, it is not possible to assign a fixed integer to the expression $x + 1$ as it might change depending on the context, i.e. whether it follows the statement $x \leftarrow 1$ or the statement $x \leftarrow 2$.

Program State

What we mean by “context” here is quite vague. Let’s make it more precise. The programs’ state is the value assigned to its variables at any given moment. We can think of the state as the semantic representation of the computer’s memory.

A state (also referred to as a store) is a total function from the set $\mathsf{State} = \mathsf{Var} \rightarrow \mathbb{Z}$. We will write states using the notation $[x_1 \mapsto v_1,\, x_2 \mapsto v_2,\, \cdots]$ to representation the state $\sigma$ such that $\sigma(x_i) = v_i$. When a variable $x$ is not explicitly mapped to a value by a given state $\sigma$, we will take $\sigma(x)$ to be $0$.

We can now give a denotational semantics to arithmetic expressions. As the value of these expressions depends on the program state, they will be interpreted as functions from states to integer values — the denotation function $\llbracket \cdot \rrbracket_{\mathcal{A}} : \mathcal{A} \rightarrow (\mathsf{State} \rightarrow \mathbb{Z})$ maps expressions, not to an integer, but to a function from the set of states to integers. However, for all intents and purposes, this can be seen as a function with two arguments: the arithmetic expression and the state in which it is evaluated. We will write $\llbracket e \rrbracket_{\mathcal{A}}(\sigma) \in \mathbb{Z}$ for the application of this function to the arithmetic expression $e \in \mathcal{A}$ and the state $\sigma \in \mathsf{State}$.

The denotation function for arithmetic expressions $\llbracket \cdot \rrbracket_\mathcal{A}$ is defined recursively by the following equations:

\[\begin{array}{rl} \llbracket x\rrbracket_{\mathcal{A}}(\sigma) &= \sigma(x) \\ \llbracket n \rrbracket_{\mathcal{A}}(\sigma) & = n \\ \llbracket e_1 + e_2 \rrbracket_{\mathcal{A}}(\sigma) & = \llbracket e_1 \rrbracket_{\mathcal{A}}(\sigma) + \llbracket e_2 \rrbracket_{\mathcal{A}}(\sigma) \\ \llbracket e_1 - e_2 \rrbracket_{\mathcal{A}}(\sigma) & = \llbracket e_1 \rrbracket_{\mathcal{A}}(\sigma) - \llbracket e_2 \rrbracket_{\mathcal{A}}(\sigma) \\ \llbracket e_1 * e_2 \rrbracket_{\mathcal{A}}(\sigma) & = \llbracket e_1 \rrbracket_{\mathcal{A}}(\sigma) \cdot \llbracket e_2 \rrbracket_{\mathcal{A}}(\sigma) \end{array}\]

The first equation introduced by this definition interprets program variables by their current value, according to the given state. The other equations are largely unchanged other than the introduction of the state parameter. Notice that the state is passed to recursive calls without changing. That is to say, these expressions only read the state rather than modifying it.

Let’s work though an example. The denotation $\llbracket x \mathbin{\underline{*}} \underline{2} \rrbracket_A(\sigma)$ evaluates to $\llbracket x \rrbracket_A(\sigma) * \llbracket \underline{2} \rrbracket_A(\sigma)$, which ultimately is equivalent to $\sigma(x) * 2$. To evaluate this further, we need to know the value $\sigma$ assigns to $x$. If we take $\sigma$ to be the state $[x \mapsto 2]$, then the denotation of our expression ultimately evaluates to $4$.

Boolean Expressions

Finally, we will give a denotational semantics for Boolean expressions. This will follow the same pattern as for arithmetic expressions, except it will interpret expressions as Boolean values.

The set of Boolean expressions $\mathcal{B}$ is defined as the elements of the following grammar:

\[B \Coloneqq \tt \mid \ff \mid \mathop{!}B \mid B \andop B \mid B \orop B \mid A = A \mid A \leq A\]

Then we define a denotation function $\llbracket \cdot \rrbracket_\mathcal{B} : \mathcal{B} \rightarrow (\mathsf{State} \rightarrow \mathbb{B})$ where $\mathbb{B}$ is the set of Boolean values $\left\lbrace \bot,\, \top \right\rbrace$ following the same pattern of replacing syntactic constructs with semantics counterparts. As with the denotation of arithmetic expressions, this function maps Boolean expressions to Boolean-valued function over states. Likewise, this can be thought of as a function with two arguments: a Boolean expression and a state.

The denotation function for Boolean expressions $\llbracket \cdot \rrbracket_\mathcal{B}$ is defined by the following equations:

\[\begin{array}{rl} \llbracket \tt \rrbracket_\mathcal{B}(\sigma) & = \top\\ \llbracket \ff \rrbracket_\mathcal{B}(\sigma) & = \bot\\ \llbracket \mathop{!}e \rrbracket_\mathcal{B}(\sigma) & = \neg \llbracket e \rrbracket_\mathcal{B}(\sigma)\\ \llbracket e_1 \andop e_2 \rrbracket_\mathcal{B}(\sigma) & = \llbracket e_1 \rrbracket_\mathcal{B}(\sigma) \wedge \llbracket e_2\rrbracket_\mathcal{B}(\sigma)\\ \llbracket e_1 \orop e_2\rrbracket_\mathcal{B}(\sigma) & = \llbracket e_1 \rrbracket_\mathcal{B}(\sigma) \vee \llbracket e_2 \rrbracket_\mathcal{B}(\sigma)\\ \llbracket e_1 = e_2 \rrbracket_\mathcal{B}(\sigma) & = \llbracket e_1 \rrbracket_A(\sigma) = \llbracket e_2 \rrbracket_A(\sigma)\\ \llbracket e_1 \leq e_2 \rrbracket_\mathcal{B}(\sigma) & = \llbracket e_1 \rrbracket_A(\sigma) \leq \llbracket e_2 \rrbracket_A(\sigma) \end{array}\]

This function is also defined by recursion over the structure of the Boolean expressions, replacing syntactic constructs with their mathematical counterpart, and thus is well-defined. Notice additionally that we are using the denotation function for arithmetic expression in this definition. For example, the following application of the denotation function for Boolean expressions depends on the denotation of arithmetic expressions. Here we use a flat representation of expressions, although they are still to be interpreted as abstract syntax trees.

Semantic Equivalence

An arithmetic expression $e_1 \in \mathcal{A}$ is said to be semantically equivalent to another arithmetic expression $e_2 \in \mathcal{A}$ just if they denote the same function, i.e. $\llbracket e_1 \rrbracket_\mathcal{A} = \llbracket e_2 \rrbracket_\mathcal{A}$. Likewise, Boolean expressions are semantically equivalent if they denote the same function.

Recall that functions are equal if they produce the same result on all inputs. Therefore, two expressions are equal if they denote the same value in all states. For example, $x + x$ is semantically equivalent to $x * 2$ as they both represent the function $\sigma \mapsto 2\sigma(x)$, despite not being syntactically equivalent expressions.

Semantic equivalence is at the heart of the study of semantics and enables practical programming language features such as optimisations - an optimisation being a semantically equivalent, but more performant, program.