An alphabet is any finite set, whose members are called letters (equivalently: symbols or characters). We typically use to denote a generic alphabet and as variables that stand for the letters.
A string (equivalently: word) over an alphabet is a finite sequence of characters from . The sequence may be empty, and we write the empty string as . We typically use as variables that stand for a string.
For example, is a string over the alphabet , but is not. Another string over the same alphabet is and another is . Strings over the alphabet include and . Strings over the alphabet in which each letter is one of these two strings, include and , but not .
The set of all strings over the alphabet is written , this always includes . We will refer to a set of strings as a language.
Now we consider some operations on strings.
A string is said to be a substring of a string just if appears consecutively in .
For example, is a substring of but it’s not a substring of . On the other hand is a substring of both (typically, we are not too concerned about distinguishing between the letter and the string consisting of a single letter). We also have that is a substring of any string (over any alphabet).
The length of a string , written , is just the number of characters in the string. That is, if then .
Given strings and , we write for the string obtained by concatenating to the end of . That is, if and then . We write for the -fold concatenation of with itself, i.e. the word .
We will use concatenation and variables to describe the shape of strings abstractly, just as we use arithmetic operators and variables to describe numbers abstractly. For example, to say that a string over the alphabet contains at least two occurrences of , we can say that:
is of shape
More precisely, with this form of words – “of shape” – we are saying that there exist strings such that (i.e. is exactly the concatenation of followed by an followed by followed by an followed by ). Every string over this alphabet that contains at least two can be decomposed in this way (there may be several ways, generally), for example:
String
Decomposition as
, ,
, ,
, ,
Compare this with how you would say that an integer is even: is of shape (for some ).
We will often use this in combination with set notation, to describe languages of strings that all have a certain shape. For example:
- the set of all strings over containing at least two
- the set of all strings over containing exactly two
- the set of all strings over that are of even length.
- the set of all strings over that are formed by repeating a substring three times consecutively, with the first wrapped in 0s, the second in 1s and the third in 0s
You can't use 'macro parameter character #' in math mode
Note! Do not confuse, e.g. with : the latter is a singleton set, it contains the string as its only element (and for that to make sense we must have introduced , and already). E.g.
Suppose, , and . Then my FAVOURITE set is , which I just ADORE because its ONE AND ONLY element is .