Logic

Connectives

How do we assert two propositions are true (or otherwise related) at once? A quick note on precedence:

Translating Sentences

Truth Tables

How can we determine the truth value of compound propositions? To make a truth table: A truth table for $(p \wedge q) \rightarrow \neg(p \vee q)$ is: \begin{array}{cc|ccc|c} p & q & p \wedge q & p \vee q & \neg(p \vee q) & (p \wedge q) \rightarrow \neg(p \vee q) \\\hline T & T & T & T & F & F \\ T & F & F & T & F & T \\ F & T & F & T & F & T \\ F & F & F & F & T & T \\ \end{array} Now, given values for $p$ and $q$, we can look at the appropriate row of the last column to find the truth value of the whole expression. Adding more variables means adding more rows. The truth table for $p \rightarrow (\neg q \vee r)$ is: \begin{array}{ccc|cc|c} p & q & r & \neg q & \neg q \vee r & p \rightarrow (\neg q \vee r) \\\hline T & T & T & F & T & T \\ T & T & F & F & F & F \\ T & F & T & T & T & T \\ T & F & F & T & T & T \\ F & T & T & F & T & T \\ F & T & F & F & F & T \\ F & F & T & T & T & T \\ F & F & F & T & T & T \\ \end{array} If there are $n$ variables, there are $2^n$ different truth value combinations and therefore $2^n$ rows. To make the table, fill the first half of the first column with $T$ and the last half with $F$. Then fill the second column by repeating this pattern in each half, and so on. This is an easy way to guarantee all possibilities are covered. Here is another example of a truth table, this time for $(\neg p \leftrightarrow \neg q) \leftrightarrow (q \leftrightarrow r)$: \begin{array}{ccc|cccc|c} p & q & r & \neg p & \neg q & \neg p \leftrightarrow \neg q & q \leftrightarrow r & (\neg p \leftrightarrow \neg q) \leftrightarrow (q \leftrightarrow r) \\\hline T & T & T & F & F & T & T & T \\ T & T & F & F & F & T & F & F \\ T & F & T & F & T & F & F & F \\ T & F & F & F & T & F & T & F \\ F & T & T & T & F & F & T & F \\ F & T & F & T & F & F & F & T \\ F & F & T & T & T & T & F & F \\ F & F & F & T & T & T & T & T \\ \end{array} Sometimes truth value doesn't depend on the other truth values: the compound proposition is always true or always false, regardless of the truth assignments of the propositions. For example, $p \vee \neg p$ is always true, regardless of whether $p$ is true or false: \begin{array}{cc|c} p & \neg p & p \vee \neg p \\\hline T & F & T \\ F & T & T \\ \end{array} Such a statement is a tautology. On the other hand, $p \wedge \neg p$ is always false, regardless of whether $p$ is true or false: \begin{array}{cc|c} p & \neg p & p \wedge \neg p \\\hline T & F & F \\ F & T & F \\ \end{array} Such a statement is a contradiction. If a statement is neither a tautology nor a contradiction, then the truth values do alter the outcome and we say that the statement is a contingency. Here are some examples that we will classify as tautologies, contradictions, or contingencies:

Logical Equivalences

There is often more than one way to write a proposition. For instance, $p$ and $\neg \neg p$ mean the same thing. We write $p \equiv \neg \neg p$ to mean "the proposition $p$ is logically equivalent to the proposition $\neg \neg p$". How do we tell if two expressions are logically equivalent? The first method is to use truth tables: Some examples: The second method is to use a series of known logical equivalences to go from one propostion to the other Any equivalence can be used, but let's stick with these. Let's see some examples. We can also use this technique to classify a proposition as a tautology or a contradiction by determining if the proposition is logically equivalent to $T$ or $F$, respectively. Here are several more examples that use logical equivalences:

Predicate Logic

Problem with propositional logic: how does one say, "Everyone in this class is a student"?

Idea: "being a student" and "being in this class" are properties that people can have, and "everyone" quantifies which people have the property. We can define a propositional function that asserts that a predicate is true about some object.

Suppose $S$ denotes the predicate "is a student". Then $S(x)$ means "$x$ is a student" for some object $x$. This works for all predicates: "is greater than", "is shorter than", "is a boat", $\ldots$

Once we have defined a propositional function, any object we give to it produces a truth value. For example, if $P(x)$ means "$x$ is greater than $3$", then:

We don't need to stop at one variable, either: if $P(x,y)$ denotes "$x$ is greater than $y$", then: This doesn't fully solve the original problem them: we now have to write $S(x_1) \wedge S(x_2) \wedge \cdots$. To fix this, we need quantifiers

Universe of Discourse

Before we can think about quantifiers, we need to think about the universe of discourse. In the above example about students, there are at least two possible universes of discourse. If the universe is "all people in the class", then saying $x_1$ is in this class" is redundant. However, if the universe of discourse is "all people", then it is important!

As another example, consider $P(x)$ to denote "$x$ is greater than $3$". Here, the universe is assumed to be, say, the set of all real numbers. If the universe of discourse was the set of all people, we would have statements like "John is greater than $3$", which makes no sense.

It is important to define a universe of discourse! Think of the universe of discourse as the set of all values (names) that you can plug into the propositional functions being considered.

Universal Quantification

Given a propositional function $P(x)$, the \emph{universal quantification} of $P(x)$ is the proposition "$P(x)$ is true for all values $x$ in the universe of discourse." We write $\forall x\; {P(x)}$ and say "for all $x$, $P(x)$" or "for every $x$, $P(x)$." The symbol $\forall$ is the universal quantifier.

This notation is essentially shorthand. If the universe of discourse consists of the objects $x_1,x_2,\ldots$, then $\forall x\;{P(x)}$ means $P(x_1) \wedge P(x_2) \wedge \cdots$. Of course, if the universe of discourse is infinite (for example, the integers or real numbers), then such shorthand becomes necessary.

Observe that since $\forall x\;{P(x)}$ is essentially a conjunction, it must be the case that it has truth value $T$ precisely when the predicate is true for all objects in the universe of discourse and $F$ otherwise. Therefore, if the predicate $P$ is false for at least one object in the universe of discourse, then $\forall x\;{P(x)}$ has truth value $F$. Here are some examples that use universal quantification:

Exisential Quantification

Given a propositional function $P(x)$, the existential quantification of $P(x)$ is the proposition "$P(x)$ is true for at least one value $x$ in the universe of discourse." We write $\exists x\;{P(x)}$ and say "there exists an $x$ such that $P(x)$" or "for some $x$, $P(x)$." The symbol $\exists$ is the existential quantifier.

Again, this notation is essentially shorthand. If the universe of discourse consists of the objects $x_1,x_2,\ldots$, then $\exists x \;{P(x)}$ means $P(x_1) \vee P(x_2) \vee \cdots$.

Observe that since $\exists x\;{P(x)}$ is essentially a disjunction, it must be the case that it has truth value $T$ precisely when the predicate is true for at least one object in the universe of discourse and $F$ otherwise. Therefore, if the predicate $P$ is false for all objects in the universe of discourse, then $\exists x\;{P(x)}$ has truth value $F$. Here are some examples that use existential quantification:

Binding of Quantifiers

The scope of a quantifier is the smallest proposition following it: $$ \underbrace{\exists x\;{P(x)}}_{\text{$x$ applies here}} \wedge \underbrace{Q(x)}_{\text{but $x$ has no meaning here}} $$ We would instead write $\exists x\;{(P(x) \wedge Q(x))}$.

It is valid to write, for example, $\exists x\;{P(x)} \wedge \exists x\;{Q(x)}$, but the $x$ in each quantifier could be completely different elements of the universe of discourse! Conversely, we might want to make sure they are not the same: $\exists x \;{\exists y\;{(P(x) \wedge Q(y) \wedge (x \neq y))}}$.

For example, if the universe of discourse is the set of integers, $E(x)$ means "$x$ is a even", and $O(x)$ means "$x$ can odd":

Negating Quantifiers

How do we negate a quantified statement?

We have the following quantifier negation rules:

This follows from De Morgan's law and the fact that quantifiers are essentially shorthand for conjunction and disjunction. Here are some examples of quantifier negation:

Translating Sentences with Predicates and Quantifiers

In the following examples, the universe of discourse is all people. Multiple quantifiers are possible: Here are some more complex examples:

Arguments and Validity

Now that we know how to state things precisely, we are ready to think about putting statements together to form arguments. A rigorous argument that is valid constitutes a proof. We need to put the statements together using valid rules.

For example, given the premises:

a conclusion might be "it will rain". Intuitively, this seems valid.

An argument is valid if the truth of the premises implies the conclusion. Given premises $p_1, p_2, \ldots, p_n$, and conclusion $c$, the argument is valid if and only if $(p_1 \wedge p_2 \wedge \cdots \wedge p_n) \rightarrow c$. Note that false premises can lead to a false conclusion!

Rules of Inference

How do we show validity? We use the rules of inference: To show that the premises imply the conclusion, we apply the rules of inference to the premises until we get the conclusion. Here are some examples of how to show an argument is valid: It is also valid to replace premises with others that are logically equivalent. For example, an implication can be replaced with its contrapositive.

Not all arguments are valid! To show an argument is invalid, find truth values for each proposition that make all of the premises true, but the conclusion false.

This works because proving an argument is valid is just showing that an implication is true. Therefore, to show an argument is invalid, we need to show that the implication is false. An implication is false only when the hypothesis is true and the conclusion is false. Since the hypothesis is the conjunction of the premises, this means that each premise is true and the conclusion is false. $$ \underbrace{(p_1 \wedge p_2 \wedge \cdots \wedge p_n)}_{\text{all $T$}} \rightarrow \underbrace{c}_{F} $$

In the above example, it happens that there is only one truth setting that results in all premises being true and the conclusion being false. In general, there could be many different such truth settings.

Arguments with Quantified Statements

Until now, we have restricted our attention to propositional logic. Recall that $P(x)$ is a propositional function, and so when $x$ is an element of the universe of discourse, we simply have a proposition that can be dealt with using the rules of inference for propositions. For predicate logic, we need a few more rules of inference that will allow us to deal with quantified statements. Here are some examples of arguments with quantified statements: The next example will help to illustrate when Universal Generalization may not be applied.

Methods of proof

How do we go about forming arguments (proofs)? Here are some more examples of proofs:

Sets

A set is an unordered collection of objects.

The objects in a set are called the set's elements or members. They are usually listed inside braces. We write $x \in A$ if $x$ is an element (member) of a set $A$.

$\{1,2,3\}$ is a set with $3$ elements. It is the same as the set $\{1,3,2\}$ (order does not matter) and the set $\{1,1,2,3,3,3,3\}$ (repetition does not matter). Typically, all objects are the same (for example, numbers), but they do not have to be: $\{ 1, 3, \text{red}, \text{blue}, \text{John} \}$ is a set.

Ellipses are used when a pattern is clear: $\{1,2,3,4,\ldots,50\}$ is the set of all integers from $1$ to $50$, inclusive.

Some sets we use a lot:

It is possible to have a set with no elements: $\{\}$. This is the empty set and is usually denoted $\emptyset$. This is not the same as $\{ \emptyset \}$, which is a set with one element (that happens to be a (empty) set).

The number of distinct elements in a set $S$ is called its cardinality and is denoted $|S|$. If $|S|$ is infinite (for example, $\mathbb{Z}$), we say the set is infinite.

One common way to define a set is set builder notation. Here are two examples:

Set Operations

Several operations can be performed on sets.

Subsets

A set $A$ is a subset of a set $B$ if every element of $A$ is an element of $B$. We write $A \subseteq B$. Another way of saying this is that $A \subseteq B$ if and only if $\forall x\;(x \in A \rightarrow x \in B)$.

For any set S, we have:

If $A \subseteq B$ and $A \neq B$, then we say $A$ is a proper subset of $B$ and write $A \subset B$.

Power Sets

The power set of a set $A$ is the set of all subsets of $A$, denoted $\mathcal{P}(A)$. For example, if $A = \{1,2,3\}$, then $\mathcal{P}(A) = \{ \emptyset, \{1\},\{2\},\{3\}, \{1,2\},\{2,3\},\{1,3\}, \{1,2,3\} \}$

Notice that $|\mathcal{P}(A)| = 2^{|A|}$.

Set Equality

Two sets are equal if they contain the same elements. One way to show that two sets $A$ and $B$ are equal is to show that $A \subseteq B$ and $B \subseteq A$:

\begin{array}{lll} A \subseteq B \land B \subseteq A & \equiv & \forall x\;{((x \in A \rightarrow x \in B) \land (x \in B \rightarrow x \in A))} \\ &\equiv & \forall x\;{(x \in A \leftrightarrow x \in B)} \\ &\equiv & A = B \\ \end{array}

Note: it is not enough to simply check if the sets have the same size! They must have exactly the same elements. Remember, though, that order and repetition do not matter.

Membership Tables

We combine sets in much the same way that we combined propositions. Asking if an element $x$ is in the resulting set is like asking if a proposition is true. Note that $x$ could be in any of the original sets.

What does the set $A \cup (B \cap C)$ look like? We use $1$ to denote the presence of some element $x$ and $0$ to denote its absence.

\begin{array}{ccc|cc} A & B & C & B \cap C & A \cup (B \cap C) \\\hline 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 0 & 0 & 1 \\ 1 & 0 & 1 & 0 & 1 \\ 1 & 0 & 0 & 0 & 1 \\ 0 & 1 & 1 & 1 & 1 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ \end{array}

This is a membership table. It can be used to draw the Venn diagram by shading in all regions that have a $1$ in the final column. The regions are defined by the left-most columns.

Venn diagram for the above membership table

We can also use membership tables to test if two sets are equal. Here are two methods of showing if $\overline{A \cap B} = \overline{A} \cup \overline{B}$:

It is not sufficient to simply draw the Venn diagrams for two sets to show that they are equal: you need to show why your Venn diagram is correct (typically with a membership table).

There is an additional way to prove two sets are equal, and that is to use set identities. In the following list, assume $A$ and $B$ are sets drawn from a universe $U$.

Note the similarities to logical equivalences! Here are some examples of how to determine if two sets are equal:

Functions

Suppose we want to map one set to the other: given an element of set $A$ (the input), return an element of set $B$ (the output).

For example, suppose $A = \{ x \;|\; x \text{ is a user on our computer system} \}$ and $B = \{ x \;|\; x \text{ is a valid password}\}$. We might want to know, given a user, what is that user's password: the input is the user (from $A$) and the output is that user's password (from $B$).

Let $A$ and $B$ be two sets. A function from $A$ to $B$ is an assignment of exactly one element from $B$ to each element of $A$. We write $f(a) = b$ if $b \in B$ is the unique element assigned by the function $f$ to the element $a \in A$. If $f$ is a function from $A$ to $B$, we write $f : A \rightarrow B$.

It makes sense to model the password example above as a function because each user has exactly one password. Here are two other examples:

Functions can be specified in several ways:

Consider the function $f:A \rightarrow B$. We call $A$ the domain of $f$ and $B$ the codomain of $f$. Furthermore, if $f(a)=b$, then $b$ is the image of $a$ and $a$ is the preimage of $b$. The set of all images of elements of $A$ is called the range of $f$. For example:

If $S$ is a subset of the domain, we can also look at its image: the subset of $B$ that consists of the images of the elements in $S$: $f(S) = \{ f(s) \;|\; s \in S \}$. In the grades example above, $g(\{\text{Tim},\text{Jo},\text{Lee}\}) = \{\text{A},\text{B}\}$.

Notice that in the grades example, $\text{A}$ had two elements map to it, while $\text{F}$ had none. We can classify functions based on such situations.

Injectivity

A function $f$ is said to be injective or one-to-one if $(f(x) = f(y)) \rightarrow (x = y)$ for all $x$ and $y$ in the domain of $f$. The function is said to be an injection.

Recall that, by contraposition, $(f(x) = f(y)) \rightarrow (x=y)$ if and only if $(x \neq y) \rightarrow (f(x) \neq f(y))$.

Basically, this means that each element of the range has exactly one pre-image. Equivalently, each element of the codomain has at most one pre-image. In a function diagram, this means there is at most one incoming arrow to every element on the right hand side.

To show a function is injective:

To show a function is not injective, give an $x$ and $y$ such that $x \neq y$ but $f(x) = f(y)$.

Here are some examples:

Surjectivity

A function $f : A \rightarrow B$ is said to be surjective or onto if for every element $b \in B$, there is an element $a \in A$ such that $f(a) = b$. The function is said to be a surjection.

Basically, this means that every element of the codomain has a pre-image. Equivalently, the codomain and range are the same. In a function diagram, this means there is at least one incoming arrow to every element on the right hand side.

To show a function is surjective, start with an arbitrary element $b \in B$ and show what the preimage of $b$ could be: show an $a \in A$ such that $f(a) = b$. To show a function is not surjective, give a $b$ such that $f(a) \neq b$ for any $a \in A$.

Here are some examples:

Notice that:

If a function is both injective and surjective, then each element of the domain is mapped to a unique element of the codomain (range). A function that is both injective and surjective is bijective. Such a function is called a bijection.

To show a function is bijective, show:

Remember to show both parts, since functions can be any combination of injective and surjective. For example, from left-to-right, the following functions are injective but not surjective, surjective but not injective, injective and surjective, and neither injective nor surjective:

Functions that demonstrate the possibile presence of the properties of injectivity and surjectivity

Inverse of a Function

If a function $f$ is bijective, then $f$ is invertible. Its inverse is denoted $f^{-1}$ and assigns to $b \in B$ the unique element $a \in A$ such that $f(a) = b$: that is, $f^{-1}(b) = a \;\leftrightarrow\; f(a) = b$.

Inverses are not defined for functions that are not bijections.

The inverse can be found by reversing the arrows in the diagram, or by isolating the other variable in the formula. Note that the inverse of $f : A \rightarrow B$ is $f^{-1} : B \rightarrow A$.

Here are some examples of functions and their inverses:

Composition of Functions

Given two functions $f$ and $g$, we can use the output of one as the input to the other to create a new function $f(g(x))$. In this function, we evaluate $g$ with input $x$ and give the result to $f$ to compute the final output.

Let $f : B \rightarrow C$ and $g : A \rightarrow B$. The composition of $f$ and $g$ is denoted $f \circ g$ (read "$f$ follows $g$") and is defined as $(f \circ g)(x) = f(g(x))$. Note: for $f \circ g$ to be defined, the range of $g$ must be a subset of the domain of $f$.

Graphically, we have:

A visual representation of the composition of two functions

Here are some examples:

One important case is composing a function with its inverse: Suppose $f(a) = b$. Then $f^{-1}(b)=a$, and:

Countable and Uncountable Sets

Notice that a bijection exists between two sets if and only if they have the same size. This allows us to reason about the sizes of infinite sets.

Consider $\mathbb{Z}^+ = \{1,2,\ldots\}$. We call a set countable if:

Otherwise, the set is uncountable.

Here are some examples.

Sequences and Sums

A sequence is a function from a subset of $\mathbb{Z}$ (usually $\{0,1,2,3,\ldots\}$ or $\{1,2,3,\ldots\}$) to a set $S$. We use $a_n$ to refer to the image of the integer $n$. We call $a_n$ a term of the sequence. The sequence itself is denoted $\{a_n\}$.

For example, if $a_n = 1/n$, then the sequence $\{a_n\}$ (beginning with $a_1$) is $a_1,a_2,a_3,\ldots$, or $1, 1/2, 1/3, 1/4, \ldots$.

A geometric sequence has the form $a, ar, ar^2, ar^3, \ldots, ar^n$ where $a$ is the initial term (a real number) and $r$ is the common ratio (also a real number). Typically, we think of such a sequence as starting with $n=0$ (since $ar^0 = a$). Here are some examples of geometric sequences:

An arithmetic sequence has the form $a, a+d, a+2d, a+3d, \ldots, a+nd$ where $a$ is the initial term and $d$ is the common difference. Typically, we think of such a sequence as starting with $n=0$ (since $a+0d = a$). Here are some examples of arithmetic sequences:

One common operation on sequences is to compute a sum of certain portions of the sequence. Suppose we have $a_1,a_2,a_3,\ldots,a_m,a_{m+1},a_{m+2},\ldots,a_n,\ldots$ and we want to consider the sum from $a_m$ to $a_n$: $a_m + a_{m+1} + a_{m+2} + \cdots + a_n$. We can write this using sigma notation: $$ \sum_{i=m}^n a_i $$ where:

There is nothing special about using $i$; any (unused) variable would work!

Here are some examples of summations and sigma notation:

Sometimes we might want to change the lower/upper limits without changing the sum. For example, suppose we want to change the sum $\displaystyle \sum_{j=1}^5 j^2$ to be written with lower limit $0$ and upper limit $4$. Then let $k=j-1$ to get $\displaystyle \sum_{j=1}^5 j^2 = \sum_{k=0}^4 (k+1)^2$

We can also split a sum up: $$\sum_{i=1}^n a_i = \sum_{i=1}^5 a_i + \sum_{i=6}^n a_i$$

This means that to exclude the first few terms of a sum, we can say: $$\sum_{i=6}^n a_i = \sum_{i=1}^n a_i - \sum_{i=1}^5 a_i$$

Summations can also be nested: $$\sum_{i=1}^n \sum_{j=1}^n ij$$

As an example, we compute $\sum_{i=1}^4 \sum_{j=1}^3 ij$: \begin{array}{rcl} \sum_{i=1}^4 \sum_{j=1}^3 ij &=& \sum_{i=1}^4 (1i + 2i + 3i) \\ &=& \sum_{i=1}^4 6i \\ &=& 6 + 12 + 18 + 24 \\ &=& 60 \end{array}

When every term is multiplied by the same thing, we can factor it out: $$\sum_{i=1}^n 6i = 6 \times \sum_{i=1}^n i$$

Here is another example of factoring, this time with a nested summation: $$\sum_{i=1}^4 \sum_{j=1}^3 ij = \sum_{i=1}^4 \left( i \times \sum_{j=1}^3 j \right) = \sum_{i=1}^4 6i = 6 \times \sum_{i=1}^4 i = 6 \times 10 = 60$$

You can also split over addition: $$\sum_{i=1}^n (i+2^i) = \sum_{i=1}^n i + \sum_{i=1}^n 2^i$$ This does not work for multiplication!

One useful tool is the sum of a geometric sequence, where $a,r \in \mathbb{R}$ and $r \neq 0$: $$ \sum_{j=0}^n ar^j = \begin{cases} \frac{ar^{n+1}-a}{r-1} & \text{if } r \neq 1 \\ (n+1)a & \text{if } r = 1 \\ \end{cases} $$

Why does this work? Let $S = \sum_{j=0}^n ar^j$. Then: \begin{array}{rcl} rS &=& r \sum_{j=0}^n ar^j \\ &=& \sum_{j=0}^n ar^{j+1} \\ &=& \sum_{k=1}^{n+1} ar^k \\ &=& \sum_{k=0}^n ar^k + (ar^{n+1} - a) \\ &=& S + (ar^{n+1} - a) \end{array}

Therefore, $rS = S + (ar^{n+1} - 1)$, so $S = \frac{ar^{n+1}-a}{r-1}$ as long as $r \neq 1$ (the case when $r=1$ is easy).

Here are some more useful summation formulas:

Try to derive some of these yourself. For example, $\sum_{k=1}^n k = \frac{n(n+1)}{2}$ can be derived by letting $S = \sum_{k=1}^n k$ and observing that: \begin{array}{ccccccccccccc} S & = & 1 & + & 2 & + & 3 & + & \cdots & + & k-1 & + & k \\ S & = & n & + & n-1 & + & n-2 & + & \cdots & + & 2 & + & 1 \\\hline 2S & = & (n+1) & + & (n+1) & + & (n+1) & + & \cdots & + & (n+1) & + & (n+1) \\ \end{array}

Since there are $n$ terms, we have $2S = n(n+1)$, so $S = \frac{n(n=1)}{2} = \sum_{k=1}^n k$.

Algorithms

An algorithm is a finite set of precise instructions for solving a problem.

Here is an algorithm for outputting the largest number from a list of $n$ numbers. Such a number is called a maximum.

  1. set a temporary variable to the first number
  2. compare the next number to the temporary variable
    • if it is larger, set the temporary variable to this number
  3. repeat step 2 until there are no more numbers left
  4. return the value of the temporary variable

Here is an algorithm for outputting the index of the number $x$ from an array of $n$ numbers. This is called a linear search.

  1. look at the first element
    • if it is equal to $x$, then return that index
  2. look at the next element
    • if it is equal to $x$, then return that index
  3. repeat step 2 until the end of the array is reached
  4. return not found

Sorting

For the sorting problem, we are given a list of elements that can be ordered (typically numbers) and wish to rearrange the list so that the elements are in non-decreasing order.

One algorithm that solves this problem is BubbleSort. It works by looking at pairs of elements in the list and "swapping" them whenever they are out of order. If this is done enough times, then the list will be in order! In pseudocode:

			BubbleSort$\left(a_1, a_2, \ldots, a_n\right)$
				for $i \gets 1$ to $n-1$
					for $j \gets 1$ to $n-i$
						if $a_j > a_{j+1}$
							swap $a_j$ and $a_{j+1}$
						end if
					end for
				end for
			

Here is how the BubbleSort algorithm works on the array $3,2,4,1,5$:

\begin{array}{c@{\hskip 0.5in}c@{\hskip 0.5in}c@{\hskip 0.5in}c} i=1 & i=2 & i=3 & i=4 \\\hline \underbrace{3,2}_\text{swap},4,1,5 & \underbrace{2,3}_\text{good},1,4,|5 & \underbrace{2,1}_\text{swap},3,|4,5 & \underbrace{1,2}_\text{good},|3,4,5 \\ 2,\underbrace{3,4}_\text{good},1,5 & 2,\underbrace{3,1}_\text{swap},4,|5 & 1,\underbrace{2,3}_\text{good},|4,5 \\ 2,3,\underbrace{4,1}_\text{swap},5 & 2,1,\underbrace{3,4}_\text{good},|5 & & \\ 2,3,1,\underbrace{4,5}_\text{good} & & & \\ \end{array}

A natural question to ask is, "How long does this take?" The answer is: it depends! (On operating system, hardware, implementation, and many other things)

Another algorithm for sorting is InsertionSort.

In pseudocode:

			InsertionSort$\left(a_1,a_2,\ldots,a_n\right)$
				for $j \gets 2$ to $n$
					$k \gets a_j$
					$i \gets j-1$
					while $i > 0$ and $a_i > k$
						$a_{i+1} \gets a_i$
						$i \gets i-1$
					end while
					$a_i \gets k$
				end for
			

Here is how the InsertionSort algorithm works on the array $3,2,4,1,5$:

Sorting a list using InsertionSort
Sorting a list using InsertionSort
Sorting a list using InsertionSort
Sorting a list using InsertionSort

How long does this take? Again, it depends. But how does it compare to BubbleSort? (Assuming same hardware, operating system, etc.)

Analysis of Algorithms

To determine how "long" an algorithm takes, we need to know how long operations (such as additions, comparisons, etc.) take. To do this, we define a model of computation.

There are many such models. For now, let's say that comparisons (\ie, $\lt,\le,\gt,\ge,=,\neq$) take one time unit ("unit time"). The actual amount of time varies (with hardware, etc.), but we assume they all take the same time. This is generally a fair assumption.

The number of such operations is the time complexity of an algorithm. Typically, we will be interested in worst-case time complexity: the maximum number of operations performed.

Here are some examples of deriving time complexity:

Notice that BubbleSort uses $n^2 + n - 1$ comparisons while InsertionSort uses $n^2 + 2n - 4$ comparisons. Therefore, BubbleSort uses fewer comparisons in the worst case.

But are these two functions really that different? Both have a $n^2$ term, which is much bigger than any (constant) fraction that they are multiplied by, and much bigger than any linear function of $n$ they are added to. As $n$ grows bigger and bigger, the $n^2$ part makes the biggest difference. In the worst case, these functions behave approximately the same.

Contrast this with finding the maximum and linear search: both use $2n-1$ comparisons. This is much faster than the two sorting algorithms, even though the leading term has coefficient $2$. This is because $n$ grows much more slowly than $n^2$. We care about large $n$.

Therefore, for the analysis of algorithms, we don't care too much about the exact function (since it is often too much work to find!) What matters is how fast the function grows.

Growth of Functions

The growth of a function is determined by the highest order term: if you add a bunch of terms, the function grows about as fast as the largest term (for large enough input values).

For example, $f(x) = x^2+1$ grows as fast as $g(x) = x^2 + 2$ and $h(x) = x^2+x+1$, because for large $x$, $x^2$ is much bigger than $1$, $2$, or $x+1$.

Similarly, constant multiples don't matter that much: $f(x)=x^2$ grows as fast as $g(x)=2x^2$ and $h(x)=100x^2$, because for large $x$, multiplying $x^2$ by a constant does not change it "too much" (at least not as much as increasing $x$).

Essentially, we are concerned with the shape of the curve:

Examples of three lines

All three of these functions are lines; their exact slope/y-intercept does not matter.

Only caring about the highest order term (without constant multiples) corresponds to ignoring differences in hardware/operating system/etc. If the CPU is twice as fast, for example, the algorithm still behaves the same way, even if it executes faster.

Big-Oh Notation

Let $f$ and $g$ be functions from $\mathbb{Z} \rightarrow \mathbb{R}$ or $\mathbb{R} \rightarrow \mathbb{R}$. We say $f(x)$ is $O(g(x))$ if there are constants $c \gt 0$ and $k \gt 0$ such that $0 \le f(n) \le c \times g(n)$ for all $x \ge k$. The constants $c$ and $k$ are called witnesses. We read $f(x)$ is $O(g(x))$ as "$f(x)$ is big-Oh of $g(x)$". We write $f(x) \in O(g(x))$ or $f(x) = O(g(x))$ (though the former is more technically correct).

Basically, $f(x)$ is $O(g(x))$ means that, after a certain value of $x$, $f$ is always smaller than some constant multiple of $g$:

Explanation of the definition of big-Oh

Here are some examples that use big-Oh notation:

Typically, we want the function inside the Oh to be as small and simple as possible. Even though it is true, for example, that $5x^2+10x+3$ is $O(x^3)$, this is not terribly informative. Similarly, $5x^2+10x+3$ is $O(2x^2 + 11x + 3)$, but this is not particularly useful.

Here are some important big-Oh results:

When using logarithms inside big-Oh notation, the base does not matter. Recall the change-of-base formula: $\log_b n = \frac{\log n}{\log b}$. Therefore, as long as the base $b$ is a constant, it differs from $\log n$ by a constant factor.

Here are some common functions, listed from slowest to fastest growth: $$ O(1), O(\log n), O(n), O(n \log n), O(n^2), O(2^n), O(n!) $$ Caution: there are infinitely many functions between each element of this list!

Big-Omega Notation

As we saw above, big-Oh provides an upper bound for a function. To specify a lower bound, we use big-Omega notation.

Let $f$ and $g$ be functions from $\mathbb{Z} \rightarrow \mathbb{R}$ or $\mathbb{R} \rightarrow \mathbb{R}$. We say $f(x)$ is $\Omega(g(x))$ if there are constants $c \gt 0$ and $k \gt 0$ such that $0 \le c \times g(n) \le f(n)$ for all $x \ge k$. The constants $c$ and $k$ are called witnesses. We read $f(x)$ is $\Omega(g(x))$ as "$f(x)$ is big-Oh of $g(x)$". We write $f(x) \in \Omega(g(x))$ or $f(x) = \Omega(g(x))$ (though the former is more technically correct).

Basically, $f(x)$ is $\Omega(g(x))$ means that, after a certain value of $x$, $f$ is always bigger than some constant multiple of $g$:

Explanation of the definition of big-Omega

Here are some examples that use big-Omega notation:

Big-Theta Notation

In the previous example, we showed that $\sum_{i=1}^n i = \Omega(n^2)$. Earlier, we also showed that this sum is $O(n^2)$. We have special notation for such situations:

Let $f$ and $g$ be functions from $\mathbb{Z} \rightarrow \mathbb{R}$ or $\mathbb{R} \rightarrow \mathbb{R}$. We say $f(x)$ is $\Theta(g(x))$ if $f(x)$ is $O(g(x))$ and $f(x)$ is $\Omega(g(x))$. We read $f(x)$ is $\Theta(g(x))$ as "$f(x)$ is big-Theta of $g(x)$". We write $f(x) \in \Theta(g(x))$ or $f(x) = \Theta(g(x))$ (though the former is more technically correct).

It might be helpful to think of big-Oh/Omega/Theta as follows:

Induction

What is the sum of the first $n$ positive odd integers?

\begin{array}{rcl} n=1: & 1 & =1 \\ n=2: & 1+3 & =4 \\ n=3: & 1+3+5 & =9 \\ n=4: & 1+3+5+7 & =16 \end{array}

So far, it seems like the pattern seems to be that the sum is $n^2$. But recognizing a pattern is not the same as a proof! How do we prove something is true for every $n$ (of which there are infinitely many)?

Imagine a long line of people, numbered $1,2,3,\ldots$. Suppose that whenever person $k$ is told something, thye tell person $k+1$. If I tell a secret to person $1$, what happens? $1$ tells $2$, $2$ tells $3$, $3$ tells $4$, and so on. So, after everyone is finished talking, everyone in the line knows what I said.

Let $P(n)$ denote the proposition "person $n$ knows the secret". The argument has premises $P(1)$ and $\forall k\;(P(k) \rightarrow P(k+1))$, with conclusion $\forall n\;P(n)$. Indeed, this is a valid argument:

\begin{array}{lll} 1. & P(1) & \\ 2. & \forall k\;(P(k) \rightarrow P(k+1)) & \therefore \forall n\;P(n) \\ \hline 3. & P(1) \rightarrow P(2) & \text{Universal Instantiation (2)} \\ 4. & P(2) & \text{Modus Ponens (1,3)} \\ 5. & P(2) \rightarrow P(3) & \text{Universal Instantiation (2)} \\ 6. & P(3) & \text{Modus Ponens (4,5)} \\ 7. & P(3) \rightarrow P(4) & \text{Universal Instantiation (2)} \\ 8. & P(4) & \text{Modus Ponens (6,7)} \\ & \vdots & \\ & P(1) \wedge P(2) \wedge \cdots & \text{Conjunction} \\ & \forall n\;P(n) & \text{Definition of } \forall \end{array}

This gives us a proof technique to prove $\forall x\;P(x)$ when the universe of discourse is the natural numbers (starting at either $0$ or $1$). To summarize: $$ \left( P(1) \wedge \left(\forall k\;(P(k) \rightarrow P(k+1))\right) \right) \rightarrow \forall n\;P(n) $$

Why do we need (and like) this?

Proving $P(1)$ is the basis step.

Proving $\forall k\;(P(k) \rightarrow P(k+1))$ is the inductive step.

Let's show that our guess that the sum of the first $n$ positive odd integers is $n^2$ is correct. Let $P(n)$ denote "the sum of the first $n$ positive itnegers is $n^2$". We want to show $\forall n\;P(n)$ where the universe of discourse is the set of positive integers.

A few notes about doing proofs by mathematical induction:

Induction works with inequalities, too. For example, here is a proof that $n \lt 2^n$ for all positive integers $n$.

Some more notes about doing proofs by mathematical induction:

Induction can be used to show things other than equalities/inequalities. For example, here is a proof that $n^3 - n$ is divisble by $3$ for all positive integers $n$:

There is nothing special about starting at $1$. We can start at any integer $b$ (by using $b$ in our basis step). This will prove the proposition in question true over the universe of discourse $\{b,b+1,b+2,\ldots\}$.

Here is an example with a different basis step. We will prove that $2^0 + 2^1 + 2^2 + 2^3 + \cdots + 2^n = 2^{n+1} - 1$ for all non-negative integers.

Recall the sum of a geometric sequence $\sum_{j=0}^n ar^j = a + ar + ar^2 + \cdots + ar^n$. We will prove that $\sum_{j=0}^n ar^j = \frac{ar^{n+1} - a}{r-1}$ when $r \neq 1$ and $n \ge 0$.

The $j$-th Harmonic number $H_j$ is defined as $H_j = 1 + \frac{1}{2} + \frac{1}{3} + \cdots + \frac{1}{j}$ when $j \ge 1$. We will prove that $H_{2^n} \ge 1 + \frac{n}{2}$ for non-negative integers $n$.

Recall that the size of the powerset of a set of size $n$ is $2^n$. We will now prove that this is true.

Recall the sum of the first $n$ positive integers: $\sum_{i=1}^n i = \frac{n(n+1)}{2}$. We will now prove that this is true.

Here is a proof that $2^n \lt n!$ for all positive integers $n \ge 4$:

Here is a proof of an extension of De Morgan's Law for sets: $\overline{\bigcap_{j=1}^n A_j} = \bigcup_{j=1}^n \overline{A_j}$, where $A_1,A_2,\ldots,A_n$ are sets and $n \ge 2$:

Induction can also be used on other types of problems. Let $n$ be a positive integer. We will prove that any $2^n \times 2^n$ chessboard with one square removed can be tiled with L-shaped pieces that cover three squares:

We will now prove that given $n \ge 2$ lines in the plane (no two of which are parallel), the total number of intersections is at most $\frac{n(n+1)}{2}$. (Recall that non-parallel lines intersect in exactly one point.)

Strong Induction

Recall that our argument when doing a proof by induction is the following:

\begin{array}{lll} 1. & P(1) & \\ 2. & \forall k\;(P(k) \rightarrow P(k+1)) & \therefore \forall n\;P(n) \\ \hline 3. & P(1) \rightarrow P(2) & \text{Universal Instantiation (2)} \\ 4. & P(2) & \text{Modus Ponens (1,3)} \\ 5. & P(2) \rightarrow P(3) & \text{Universal Instantiation (2)} \\ 6. & P(3) & \text{Modus Ponens (4,5)} \\ 7. & P(3) \rightarrow P(4) & \text{Universal Instantiation (2)} \\ 8. & P(4) & \text{Modus Ponens (6,7)} \\ & \vdots & \\ & P(1) \wedge P(2) \wedge \cdots & \text{Conjunction} \\ & \forall n\;P(n) & \text{Definition of } \forall \end{array}

Notice that when we are proving $P(k+1)$, we know more than just $P(k)$; we actually know $P(1) \wedge P(2) \wedge \cdots P(k)$ is true! Therefore, it is valid to use $P(1) \wedge P(2) \wedge \cdots P(k)$ as our inductive hypothesis instead of simply $P(k)$. Using this inductive hypothesis is called strong induction and can (sometimes) make proofs simpler.

We will now look at some examples of proofs that use strong induction.

Consider a game: two players remove any number of matches they want from $1$ of $2$ piles. The player who removes the last match wins the game. The piles initially contain $n \ge 1$ matches each. We will prove that the player who goes second can always win.

Here is a proof that if $n \gt 1$, then $n$ can be written as a product of prime numbers.

Notice that in the previous proof, it is not straightforward to apply the original formulation of induction! For some proofs, both techniques apply equally well.

For example, we will prove that every amount of postage greater than or equal to 12¢ can be formed using 4¢ and 5¢ stamps.

Note that the inductive step in the strong induction proof is only valid because we included the extra cases in the basis step.

Observe that using the usual method of induction resulted in a longer inductive step but shorter basis step, while strong induction resulted in a longer basis step but shorter inductive step.

Relations

Relations between elements of sets are very common. For example:

More formally: let $A$ and $B$ be sets. A binary relation from $A$ to $B$ is a subset of $A \times B$.

Therefore, a binary relation $R$ is just a set of ordered pairs. We write $aRb$ to mean $(a,b) \in R$ and $a \cancel{R} b$ to mean $(a,b) \notin R$. When $(a,b) \in R$, we say that "$a$ is related to $b$ by $R$".

Such relations are binary relations because $A \times B$ consists of pairs. In general, we can have relations that are subsets of $A \times B \times C \times \cdots$ to give us an $n$-ary relation. We concentrate on the binary case.

For example, let $A$ be the set of students at Carleton and $B$ be the set of courses at Carleton. Let $R$ be the "enrolled in" relation, so that $(a,b) \in R$ (that is, $aRb$) if student $a$ is enrolled in course $b$.

Note that you can think of all functions as relations (where the input is related to the output), but not vice versa (since a single element can be related to many others).

Sometimes, relations are between a set $A$ and itself: a subset of $A \times A$. In this case, we say the relation is on the set $A$.

Here are some more examples of relations:

Properties of Relations

Relations can have several properties which help to classify them.

Combining Relations

Relations are sets, so they can be combined the same way sets can be combined.

Remember the relations are like functions, so it makes sense to talk about their composition, too.

Let $R$ be a relation from set $A$ to set $B$. Let $S$ be a relation from set $B$ to set $C$. The composition of $R$ and $S$ is the relation consisting of ordered pairs $(a,c)$ where $a \in A$, $c \in C$ and there exists an element $b \in B$ such that $(a,b) \in R$ and $(b,c) \in S$. We write $S \circ R$ to denote this relation.

For example, if we have a relation $R$ from the set $\{1,2,3\}$ to the set $\{1,2,3,4\}$ and a relation $S$ from the set $\{1,2,3,4\}$ to the set $\{0,1,2\}$ defined as follows:

then $S \circ R = \{ (1,0),(1,1),(2,1),(2,2),(3,0),(3,1) \}$.

One special case of composition occurs when you compose a relation with itself. For example, let $R = \{ (a,b) \;|\; a \text{ is a parent of } b \}$ be defined on the set of all people. Then $R \circ R$ is the set of ordered pairs $(a,c)$ such that there exists a person $b$ so that $a$ is a parent of $b$ and $b$ is a parent of $c$, \ie, $a$ is a grandparent of $c$.

Of course, this produces a new relation which can be composed with $R$ again to produce the "is a great-grandparent of" relation.

As a shortcut, we write $R^2 = R \circ R$, $R^3 = (R \circ R) \circ R)$, and so on.

Let $R = \{ (1,1),(2,1),(3,2),(4,3) \}$. Then:

Interestingly, we can understand transitivity in terms of composition: a relation $R$ on a set $A$ is transitive if and only if $R^n \subseteq R$ for $n \ge 1$.

Reflexive Closure

Sometimes a relation does not have some property that we would like it to have: for example, reflexivity, symmetry, or transitivity.

How do we add elements to our relation to guarantee the property? Ideally, we'd like to add as few new elements as possible to preserve the "meaning" of the original relation.

We first consider making a relation reflexive. This is called the reflexive closure. Suppose we have a relation $R$ on a set $A$ and want to make it reflexive. We need to ensure that $(a,a)$ is in the relation for all $a \in A$. We also do not wish to add anything extra.

Define $\Delta = \{ (a,a) \;|\; a \in A \}$. The reflexive closure of $R$ is $R \cup \Delta$.

For example, the reflexive closure of $R = \{ (a,b) \;|\; a \lt b \}$ on the set of integers is the relation

$$R \cup \Delta = \{ (a,b) \;|\; a \lt b \} \cup \{ (a,a) \;|\; a \in \mathbb{Z} \} = \{ (a,b) \;|\; a \le b \}$$

Symmetric Closure

For the symmetric closure, we want to ensure that $(b,a)$ is in the closure relation whenever $(a,b)$ is in the original relation.

Define $R^{-1} = \{ (b,a) \;|\; (a,b) \in R \}$. The symmetric closure of $R$ is $R \cup R^{-1}$.

For example, the symmetric closure of $R = \{ (a,b) \;|\; a \lt b \}$ on the set of integers is the relation

$$R \cup R^{-1} = \{ (a,b) \;|\; a \lt b \} \cup \{ (b,a) \;|\; a \gt b \} = \{ (a,b) \;|\; a \neq b \}$$

Transitive Closure

Consider $R = \{ (1,3),(1,4),(2,1),(3,2) \}$ on $A = \{1,2,3,4\}$. This is not a transitive relation: it is missing $\{(1,2),(2,3),(2,4),(3,1) \}$. Let's try adding them and calling the new relation $R'$:

$$R' = \{ (1,3),(1,4),(2,1),(3,2),(1,2),(2,3),(2,4),(3,1) \}$$

Unfortunately, this relation is still not transitive: $(3,1) \in R'$ and $(1,4) \in R'$, but $(3,4) \notin R'$. It seems we will need to do a bit more work.

Recall that if we compose $R$ with itself, then we get the elements $(a,c)$ where $(a,b) \in R$ and $(b,c) \in R$ for some $b$. We need these elements for transitivity, so add them to $R$. As we saw above, this might not be enough, so we repeat this.

How any times do we repeat? This is the same as asking how many intermediate elements we could find. Since there are $|A|$ possible intermediate elements, we might need to do as many as $|A|$ compositions.

Therefore, the transitive closure of $R$ over a set $A$ is

$$R \cup R^2 \cup R^3 \cup \cdots \cup R^{|A|}$$

Here is an example of computing the transitive closure:

It turns out we can view this another way if we look at the matrix representation. Given a matrix representations $M_R$ and $M_S$ for the relations $R$ and $S$, the matrix representation of $S \circ R$ is $M_R \odot M_S$, where $\odot$ denotes the join operation. This operation is identical to matrix multiplication, but any non-zero entries are simply written as ones.

The matrix representation of $R$ in the previous example is $$ M_R = \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 1 & 0 \\ \end{bmatrix} $$

Therefore, we have: $$ M_{R^2} = M_R \odot M_R = \begin{bmatrix} 1 & 1 & 1 \\ 0 & 1 & 0 \\ 1 & 1 & 1 \\ \end{bmatrix} $$ and $$ M_{R^3} = M_{R^2} \odot M_R = \begin{bmatrix} 1 & 1 & 1 \\ 0 & 1 & 0 \\ 1 & 1 & 1 \\ \end{bmatrix} $$

We now take the union of these matrices by taking the logical-or of each entry: $$ \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 1 & 0 \\ \end{bmatrix} \vee \begin{bmatrix} 1 & 1 & 1 \\ 0 & 1 & 0 \\ 1 & 1 & 1 \\ \end{bmatrix} \vee \begin{bmatrix} 1 & 1 & 1 \\ 0 & 1 & 0 \\ 1 & 1 & 1 \\ \end{bmatrix} = \begin{bmatrix} 1 & 1 & 1 \\ 0 & 1 & 0 \\ 1 & 1 & 1 \\ \end{bmatrix} $$

As we would expect, this is the matrix representation of $\{ (1,1),(1,2),(1,3),(2,2),(3,1),(3,2),(3,3) \}$; the same answer we computed previously.

Equivalence Relations

We intuitively know what it means to be "equivalent", and some relations satisfy these intuitions, while others do not.

Consider the usual "$=$" relation. This is a perfectly good relation: we usually write $1=1$, for example, but can define $R = \{ (a,b) \;|\; a=b \}$ on $\mathbb{Z}$, and write $(1,1) \in R$ or $1R1$.

Why do we say the "$=$" relation expresses equivalence?

Other examples include logical equivalence, set equivalence, and many others.

A relation is an equivalence relation if it is reflexive, symmetric and transitive.

To show something is an equivalence relation, just show that it has all of these properties. To show a relation is not an equivalence relation, show it does not satisfy at least one of these properties.

Here are some examples of determining if relations are equivalence relations:

Equivalence Classes

Equivalence relations naturally partition the elements of the set they are defined on into several classes.

For example, let $R = \{ (a,b) \;|\; a \text{ got the same grade as } b \text{ in this course} \}$ over the set of students in this course. It is clear that $R$ is an equivalence relation. Observe that it also partitions the students in this course into classes: A+, A, A-, B+, and so on.

Let $R$ be an equivalence relation on a set $A$. The set of all elements related by $R$ to $a$ is called the equivalence class of $a$ and is denoted $[a]_R$ (or $[a]$ when $R$ is clear from context): $$[a]_R = \{ b \;|\; (a,b) \in R \}$$

If $b \in [a]_R$, then $b$ is called a representative of the equivalence class. Any member of the class can be chosen to be a representative.

Here are some examples of working with equivalence classes:

In the grades example we saw before, the equivalence classes were the students who achieved the same grade. Notice that this is a partition: two equivalence classes are either equal or disjoint, and the union of all equivalence classes is the set over which the relation is defined.

This works the other way, too: given a partition of a set $A$, we can always construct an equivalence relation $R$ on $A$ with those equivalence classes.

Partial Orders

A relation $R$ on a set $A$ is called a partial ordering or partial order if it is reflexive, antisymmetric, and transitive. A set $A$ together with a partial order $R$ on that set is called a partially ordered set or poset and is denoted $(A,R)$. Members of $A$ are called elements of the poset.

Here are some examples:

In a partial order, we cannot always compare two elements. The elements $a,b$ of a poset $(S,\preceq)$ are comparable if either $a \preceq b$ or $b \preceq a$. When neither is true, they are incomparable.

The fact that we can have incomparable elements is why the relations are called partial orderings.

If $(S,\preceq)$ is a poset and every two elements of $S$ are comparable, $S$ is called a totally ordered set or a linearly ordered set and $\preceq$ is called a total order or linear order. A totally ordered set is also called a chain.

Hasse Diagrams

We can represent a partial order graphically using a tool called a Hasse diagram. The idea is to draw the relation as a graph consisting of a vertex for every element in the set and edges denote which elements of the set are related by the partial order. For the sake of conciseness, edges which must appear (because of reflexivity and transitivity) are omitted.

For example, consider the poset $(\{1,2,3,4\}, \le)$. We start with all information.

A complete graphical representation of a poset

We now remove all self-loops:

A graphical representation of a poset with self-loops removed

We then remove the edges required for transitivity:

A graphical representation of a poset with self-loops and edges required for transitivity removed

We now remove the arrows by placing all of the initial elements below the terminal elements:

A graphical representation of a poset with self-loops and edges required for transitivity removed, and all edges oriented upwards

This is the Hasse diagram.

Here are some more examples of Hasse diagrams:

The other benefit of Hasse diagrams is that it is easier to pick out certain special elements.

An element $a$ is maximal in a poset $(S,\preceq)$ if there is no $b \in S$ with $a \prec b$ (that is, $a \preceq b$ but $a \neq b$).

An element $a$ is minimal in a poset $(S,\preceq)$ if there is no $b \in S$ with $b \prec a$.

In a Hasse diagram, the maximal element(s) are at the top and the minimal element(s) are at the bottom, but only in the sense of where the edges enter and leave, not their location on the diagram!

Sometimes, there is an element greater than every other element. If $b \preceq a$ for all $b \in S$, then $a$ is the greatest element of $(S,\preceq)$. If $a \preceq b$ for all $b \in S$, then $a$ is the least element of $(S,\preceq)$. Both the greatest and least elements are unique when they exist.

You might want to bound some subset of the poset. Given a poset $(S,\preceq)$ and a subset $A \subseteq S$:

For example:

A Hasse diagram

If $a \preceq x$ for any $a \in A$ and $x \preceq z$ for any upper bound $z$ of $A$ then $a$ is the least upper bound of $A$.

If $y$ is a lower bound of $A$ and $z \preceq y$ whenever $z$ is a lower bound of $A$ then $y$ is the greatest lower bound of $A$.

If the least upper bound or greatest lower bound exist, then they are unique.

In the previous example:

Topological Sorting

Suppose you are building a house. We can define a partial order on the "must be done before" relation:

A Hasse diagram for constructing a house

This is a partial order and must be respected when constructing a house. But this does not specify a valid ordering. There could be many! For example, we don't care if we do exterior painting or plumbing first (since they are incomparable elements).

We need a total order that respects the partial order. A total ordering $\preceq$ is compatible with a partial ordering $R$ if $a \preceq b$ whenever $aRb$.

Observe that every finite, non-empty poset has at least one minimal element.

Therefore, to find a compatible ordering, remove a minimal element and place it at the front of the total order. Now the initial partial order is a partial order with one fewer element. Keep doing this until all elements are gone. This is called topological sorting.

Here is how to topologically sort the poset $(\{1,2,3,4,6,8,12\},|)$. Here is the initial Hasse diagram:

Step 1 of topologically sorting a poset

The minimal element is $1$, which is the first element of our total order. This leaves us with the following Hasse diagram:

Step 2 of topologically sorting a poset

Both $2$ and $3$ are minimal elements, so we can select either. Let's pick $3$, which will be the second element in our total order. This leaves us with the following Hasse diagram:

Step 3 of topologically sorting a poset

$2$ is now the minimal element, which will be the third element in our total order. This leaves us with the following Hasse diagram:

Step 4 of topologically sorting a poset

Both $4$ and $6$ are minimal elements, so we can select either. Let's pick $4$, which will be the fourth element in our total order. This leaves us with the following Hasse diagram:

Step 5 of topologically sorting a poset

Both $6$ and $8$ are minimal elements, so we can select either. Let's pick $8$, which will be the fifth element in our total order. This leaves us with the following Hasse diagram:

Step 6 of topologically sorting a poset

$6$ is now the minimal element, which will be the sixth element in our total order. This leaves us with the following Hasse diagram:

Step 7 of topologically sorting a poset

This means that the final element in the total order is $12$, giving us a total order of $1,3,2,4,8,6,12$. Other answers are possible!

A total order for house construction would be (for example) foundation, framing, roofing, exterior siding, plumbing, wiring, exterior painting, exterior fixtures, wallboard, flooring, interior painting, carpeting, interior fixtures, completion.

Graphs

Graphs can be used to model problems from virtually any field.

A graph is a pair $(V,E)$ where $V$ is a set called the vertex set and $E$ is a set called the edge set. Each edge in $E$ describes how vertices in $V$ are connected.

Here is an example of a graph:

An example of a graph

This particular graph is a simple graph:

In this graph, edges don't have a direction. We say that the graph is undirected. Therefore, edges can be represented as sets consisting of two vertices. For the above graph,

Sometimes, we might want to allow multiple edges between vertices. Such graphs are called multigraphs:

An example of a multigraph

Edges are still undirected; we can represent them as sets of two vertices. However, now these sets can appear more than once. We say that if there are $m$ distinct edges between $u$ and $v$, the edge $\{u,v\}$ has multiplicity $m$. Multigraphs are used, for example, to model redundant connections in a network.

We might also want to relax the restriction that there are no edges between a vertex and itself. Such edges are called self-loops and a graph that contains self-loops is called a pseudograph. Self-loops model such things as loopbacks in networks.

We can also have directed versions of these graphs, where edges only go in one direction. Here is an example of a directed multigraph:

An example of a directed multigraph

Directed edges can be represented as an ordered pair $(u,v)$ instead of a set $\{u,v\}$. Directed edges model such things as single duplex lines or one-way streets in road networks.

There are many uses of graphs:

Representing Graphs

We need a way of representing graphs if we want to perform operations on them. We could simply list the vertices and edges, but that is a lot of work and it is hard to extract much information from that representation.

Here are two alternatives:

Adjacency and Degree

Let $G = (V,E)$ be a graph.

For example:

Examples of adjacency and degree

The Handshaking Theorem says that, for a graph $G=(V,E)$, we always have $$\sum_{v \in V} \deg(v) = 2|E|$$

For a directed edge $(u,v)$ in a directed graph, $u$ is the initial vertex and $v$ is the terminal vertex or end vertex.

Note that $\sum_{v \in V} \deg^-(v) = \sum_{v \in V} \deg^+(v) = |E|$.

Some Special Graphs

There are a few (types of) graphs with special names:

Subgraphs

A subgraph of a graph $G=(V,E)$ is a graph $H=(W,F)$ such that $W \subseteq V$ and $F \subseteq E$. A subgraph $H$ of $G$ is a proper subgraph of $G$ if $H \neq G$.

For example, on the left we have $K_5$ and on the right is a subgraph of $K_5$.

A graph and a subgraph

A subgraph is spanning if it contains all vertices of the original graph.

Connectivity

Sometimes, we want to know if two vertices in a graph are connected by a sequence of edges that might visit other vertices on the way (for example, can two computers on a network communicate?)

A path is a sequence of edges that begins at a vertex and travels from vertex to vertex along edges of the graph.

More formally, a path of length $n \ge 0$ from vertex $u$ to vertex $v$ in $G$ is a sequence of $n$ edges $e_1,e_2,\ldots,e_n$ of $G$ such that $e_1 = \{ x_0 = u, x_1 \}, e_2 = \{ x_1,x_2 \}, \ldots, e_n = \{x_{n-1},x_n\}$.

For example:

A graph and some paths in it

An undirected graph is connected if there is a path between every two distinct vertices in the graph. For example, the graph on the left is connected but the graph on the right is disconnected.

A connected graph and a disconnected graph

The different parts that are maximally connected are called connected components.

One special type of connected (sub)graph is a tree, which is a connected undirected graph with no simple circuits. For example, the graph on the left is a tree; the graph in the center is not a tree because it contains a circuit; and the graph on the right is not a tree because it is not connected.

Examples of graphs that are and are not trees

Because trees are "simple" in structure, we often want to find a spanning subgraph that is a tree:

An example of a graph and a spanning tree of that graph

In the graph above, the red edges form a spanning subgraph because every vertex appears exactly once. It is a tree because it is connected and contains no circuits.

Returning to subgraphs in general, we note that sometimes removing a single vertex or edge would cause a graph to become disconnected:

For example, consider the following graph:

An example of a graph with cut vertices and edges

The cut vertices are $b,c,e$, and the cut edges are $\{a,b\},\{c,e\}$.

We can also talk about connectivity in directed graphs:

For example, the graph on the left is both strongly and weakly connected, while the graph on the right is only weakly connected since there is no path from $a$ to $b$.

Connectedness in directed graphs

Depth First Search

Suppose you are trying to explore a graph:

How do you do this in an orderly way, so that you don't end up getting lost (looping forever)? Idea:

  1. mark all vertices as unvisited
  2. start at an arbitrary vertex
  3. go to an unvisited neighbour
  4. repeat until you have seen all unvisited neighbours
  5. go back

This approach is called a depth first search. Consider the following graph:

An example graph to illustrate depth first search

A depth first search proceeds as follows:

  1. start at (for example) $f$
  2. visit $f,g,h,k,j$
  3. backtrack to $k$
  4. backtrack to $h$
  5. visit $i$
  6. backtrack to $h$
  7. backtrack to $f$
  8. visit $f,d,e,c,a$
  9. backtrack to $c$
  10. visit $c,b$

Notice that this produces a spanning tree, since all nodes are visited and there are no cycles (since having a cycle would mean visiting an already-visited node).

Depth first search has many applications. For example:

Breadth First Search

Instead of always going as deep as possible (as in depth first search), we can try to explore gradually at specific distances from the starting vertex. Such a search is called a breadth first search and proceeds as follows:

  1. keep a list of vertices seen so far, initially containing an arbitrary vertex
  2. add all adjacent vertices to the end of the list
  3. take the first vertex off the list and visit it
  4. add its unvisited neighbours to the end of the list
  5. repeat previous two steps until list is empty

Consider the following graph:

An example graph to illustrate breadth first search

A breadth first search proceeds as follows:

  1. start at (for example) $e$
  2. visit $b,d,f,i$ (the vertices at distance $1$ from $e$)
  3. visit $a,c,h,j,g,k$ (the vertices at distance $2$ from $e$)
  4. visit $l,m$ (the vertices at distance $3$ from $e$)

The process also produces a spanning tree. In fact, this also gives the path with the fewest edges from the start node to any other node. This is the same as the shortest path if all edges are the same length or have the same cost.

Planarity

Sometimes, it is easy to get confused about a graph when looking at a picture of it because many of the edges cross and it is difficult to determine what edge is going where. It is often nicer to look at graphs whose edges do not cross.

Notice that there are often many ways to draw the same graph. For example, here are two ways of visualizing the complete graph $K_4$:

Two drawings of the complete graph on four vertices

Even though they are drawn differently, they are essentially the same graph: four vertices where each vertex is connected to every other vertex. However, the drawing on the right is "nicer" because none of its edges cross.

We call a graph planar if it is possible to draw it in the plane without any edges crossing. Such a drawing is called a planar representation of the graph.

The above example shows that $K_4$ is planar.

Not all graphs are planar, however! For example, $K_{3,3}$ cannot be drawn without crossing edges. To see this, recall what $K_{3,3}$ looks like:

The complete bipartite graph with two partitions of three vertices each

Observe that the vertices $v_1$ and $v_2$ must be connected to both $v_4$ and $v_5$. These four edges form a closed curve that splits the plane into two regions $R_1$ and $R_2$. The vertex $v_3$ is either inside the region $R_1$ or $R_2$. Let's suppose that $v_3$ is in $R_2$ for the time being. Since $v_3$ is connected to $v_4$ and $v_5$, it divides $R_2$ into two subregions, $R_{21}$ and $R_{22}$. The situation is as follows:

Demonstrating that the complete bipartite graph with two partitions of three vertices each is not planar

Now, consider where to place the final vertex $v_6$. If it is placed in $R_1$, then the edge between $v_3$ and $v_6$ must have a crossing. If it is placed in $R_{21}$, then the edge between $v_2$ and $v_6$ must have a crossing. If it is placed in $R_{22}$, then the edge between $v_1$ and $v_6$ must have a crossing. A similar argument works for the case when $v_6$ is in $R_2$.

We have shown that $K_{3,3}$ is not planar: it cannot be drawn without crossings.

Properties of Planar Graphs

Planar graphs have some nice properties.

Let $G=(V,E)$ be a planar graph, and let $v$ denote the number of vertices, $e$ denote the number of edges, and $f$ denote the number of faces (including the "outer face"). Then $v - e + f = 2$. This is known as Euler's formula.

To prove this, consider two cases. If $G$ is a tree, then it must be the case that $e=v-1$ and $f=1$ (since otherwise there would be a cycle). Therefore, $v - e + f = v - (v-1) + 1 = v-v+1+1 = 2$. Otherwise, if $G$ is not a tree, then $G$ must have a cycle with at least $3$ edges. If we delete an edge on that cycle, $v$ stays the same, while $e$ and $f$ both decrease by $1$ and so the equality is still true. (This is a proof by induction in diguise!)

Another nice property is that if $G=(V,E)$ is a connected planar simple graph and $|V| \ge 3$, then $|E| \le 3|V|-6$. This is a consequence of Euler's formula. This property can be used to prove that $K_5$ is not planar. To see this, observe that $K_5$ has five vertices and ten edges; this does not satisfy $|E| \le 3|V|-6$ and therefore the $K_5$ cannot be planar.

If $G=(V,E)$ is a connected planar simple graph, then $G$ has a vertex of degree at most $5$. To see this, observe that if $G$ has one or two vertices, the result is true. If $G$ has at least three vertices, we know that $|E| \le 3|V|-6$, so $2|E| \le 6|V|-12$. The Handshaking Theorem says that $\sum_{v \in V} \deg{v} = 2|E|$. If the degree of every vertex were at least $6$, then we would have $2|E| \ge 6|V|$, which contradicts the inequality $2|E| \le 6|V|-12$.

Planar graphs also have small average degree. The average degree of a graph $G=(V,E)$ is $2|E|/|V|$. Using the fact that $|E| \le 3|V|-6$ (when $|V| \ge 3$), we get that $$\frac{2|E|}{|V|} \le \frac{2(3|V|-6)}{|V|} \le 6 - \frac{12}{|V|}$$ as long as $|V| \ge 3$, this is strictly less than $6$. This means that, on average, the degrees of vertices in a planar graph are small.

Graph Colouring

As mentioned earlier, many applications in the real world can be modelled as graphs. One recurring application is to colour a graph: assign a colour to every vertex so that no two adjacent vertices share the same colour.

Consider the graph of courses at Carleton where two courses are connected by an edge if there is at least one student in both courses. To schedule exams, each vertex will be assigned a colour to represent its time slot. To avoid conflicts, courses with common students must have their exams scheduled at different times: they must be assigned different colours.

A colouring of a simple graph is the assignment of a colour to each vertex of the graph such that no two adjacent vertices are assigned the same colour. The chromatic number of a graph is the smallest number of colours required to colour the graph, and is usually denoted $\chi(G)$ for a graph $G$. For example:

One particularly interesting case is computing the chromatic number for simple planar graphs. Here is a proof that the chromatic number of a planar graph is at most $6$:

We will prove the statement by induction on the number of vertices.

Computing the chromatic number of a general graph is tricky! There are no efficient algorithms to determine $\chi(G)$ for a general graph $G$ if nothing else is known about it.

At least one colour and at most $n$ colours are required, so $1 \le \chi(G) \le n$ for any graph $G$. The only graphs that require only one colour are those without edges. The graphs that require two colours are bipartite graphs (including trees, for example).

What if we just want some colouring of $G$, perhaps using more than $\chi(G)$ colours? One possible algorithm is to use a greedy colouring. To do this, order the vertices in some specific way, and assign each vertex in sequence the smallest available colour not used by that vertex's neighbours. The trick is using a good order!

Example graph for use with the greedy colouring algorithm

Consider the graph above. If the ordering alternates from left to right, a $4$-colouring is produced. If the ordering goes all the way up the left and then all the way up the right, a $2$-colouring is produced. This graph can be generalized to have $k$ vertices on the left and $k$ vertices on the right. The orderings would then produce a $k$-colouring and a $2$-colouring, respectively. This is a huge difference!