$\newcommand{\R}{\mathbb{R}}\DeclareMathOperator{\E}{\mathbf{E}}\DeclareMathOperator{\deg}{deg}$
Closest Pair and Linear Programming

This lecture looks at two randomized algorithms for two geometric problems. Both algorithms are randomized incremental algorithms: They each compute a random permutation of the input and then incrementally solve the problem by adding one input element at a time and updating the current solution after the addition of each element.

# Closest Pair

• Let $P$ be a set of $n$ points in $\R^2$
• We want to find a closest pair in $P$ that is, a pair of points $\{a,b\}\subset P$ such that the distance between $a$ and $b$ is minimum over all pairs of points in $P$
• The obvious algorithm checks all $\binom{n}{2}$ pairs and runs in $\Theta(n^2)$ time.

For simplicity, we will assume that all inter-point distances are unique, so there are no ties. Recall that a hash table can store a set of key/value pairs so that, we can test if any key is in the set and look up its value in $O(1)$ expected time.

We will solve the closest-pair problem with this randomized incremental algorithm:

• Take a random permutation of the points of $P$; call this $(p_1),\ldots,(p_n)$, where $p_i=(x_i,y_i)$
• The algorithms will keep track of the closest pair among all $p_1,\ldots,p_i$.
• The algorithm uses a hash table whose keys are pairs of integers to store $p_1,\ldots,p_i$
• Let $d^\star$ be the distance between closest pair among $p_1,\ldots,p_i$ (the closest pair found so far)
• A point with coordinates $(x,y)$ is stored at table location $K(x,y) = (\lfloor x\sqrt{2}/d^\star \rfloor, \lfloor y\sqrt{2}/d^\star\rfloor)$
• Notice that there is only one point stored in each table location (since two points in the same table location have distance less than $d^\star$)
• To add a new point $p_{i+1}$, we compute it's table location $K(x_{i+1},y_{i+1})=(k,\ell)$; and we look at table locations $(k\pm 2,\ell\pm 2)$
• If $p_{i+1}$ is part of a new closest pair, then we update $d^\star$, and rebuild our hash table with this new value of $d^\star$ in $O(i)$ time.

Lemma: The probability that $p_{i}$ is part of the closest pair among $p_1,\ldots,p_i$ is at most $2/i$.

Proof: Remember that $p_1,\ldots,p_n$ is a random permutation of $P$, so $p_1,\ldots,p_i$ is a random permutation of $P_i=\{p_1,\ldots,p_i\}$. Exactly two elements $a$ and $b$ of $P_i$ define the closest pair and $p_i$ is equally likely to be any of the $i$ elements of $P_i$ so the probability that $p_i\in\{a,b\}$ is $2/i$. ∎

Theorem: Given a set of $n$ points in $\R^2$, we can find the closest pair in $O(n)$ expected time

Proof: After computing the random permutation, the rest of the algorithm examines each point, does a constant number of work, and possibly rebuilds the hash table. Let $c>0$ be some constant and for each $i\in\{2,\ldots,n\}$, let
$I_i = \begin{cases} ci & \text{if p_i is part of the CP of p_1,\ldots,p_i} \\ 0 & \text{otherwise .} \end{cases}$ Exercise: Check that $\E[I_i] \le 2c$. Then, the expected amount of time spent rebuilding hash tables is $\E\left[\sum_{i=2}^n I_i\right] = \sum_{i=2}^n \E[I_i] = \sum_{i=2}^n 2c = \sum_{i=2}^n O(1) = O(n) \enspace .$ The other steps of the algorithm easily run in $O(n)$ time. If you're not sure how the random permutation is generated, you can read about the Fisher–Yates Shuffle. ∎

The preceding algorithm easily generalizes to points in $\R^{d}$, though the expected running time becomes $O(C^d n)$ for some constant $C$. So the algorithm is linear in $n$ but exponential in $d$. This is pretty common for geometric problems and is sometimes called the curse of dimensionality.

# Linear Programming

• We are give a set $L$ of $n$ lines in $\R^2$ and we want to find the lowest point that is above all the lines. (Think of dropping a marble on to the set of lines. Where does it roll to?)
• We can check that there is at least one line with positive slope and one line with negative slope (otherwise the program is unbounded, the marble rolls away forever)
• Trivial algorithm checks every pair of line intersections in $\Theta(n^3)$ time (we have to check if each intersection is above all the lines)

Exercise: How much better can you do than the trivial algorithm? $O(n^2)$, $O(n\log n)$?

To make life simple, we assume there are no horizontal lines and no three lines interect in a common point. We can solve this problem again with a randomized incremental algorithm.

• Let $\ell_1$ and $\ell_2$ be a line with negative slope and a line with positive slope and let $p^\star$ be their intersection point
• $\ell_3,\ldots,\ell_n$ is a random permutation of the remaining $n-2$ lines.
• When considering $\ell_i$, one of two things happens:
1. $\ell_i$ is below $p^\star$ in which case we do nothing
2. $\ell_i$ is above $p^\star$ in which case the new solution involves $\ell_i$ and we can find it in $O(i)$ time

This should look familiar. All we need to show now is that the new solution is not likely to involve $\ell_i$. Exactly 2 lines in $\ell_1,\ldots,\ell_i$ define the optimal solution for $\ell_1,\ldots,\ell_i$. Now, $\ell_i$ is uniformly chosen from a set of size $i-2$, so the probability that $\ell_i$ is one of the two that defines the solution is at most $2/(i-2)$. The rest of the analysis is just like the analysis of the closest pair algorithm.

Theorem: Given $n$ lines in $\R^2$, we can find the lowest point that is above all the lines in $O(n)$ expected time.

The preceding algorithm also generalizes to $\R^d$, and again, the expected running-time is $O(C^dn)$ for some constant $C$. The generalization is less straightforward, though, and involves a recursion on the dimension, $d$.