COM­P4804: Al­go­rithms II
$\new­com­mand{\R}{\mathbb{R}}\De­clar­e­Math­Op­er­a­tor{\E}{\mathbf{E}}\De­clar­e­Math­Op­er­a­tor{\deg}{deg}$
Cher­noff's Bound­ing Method

Here are my orig­i­nal hand-writ­ten notes and more on this topic.

We have seen Cher­noff's Bounds on a bi­no­mial$(p,n)$ ran­dom vari­able, $B$: For any $0 < \ep­silon < 1$, \[ \Pr\{B \ge (1+\ep­silon)pn\} \le e^{-\ep­silon^2pn/3} \en­space , \] and \[ \Pr\{B \le (1-\ep­silon)pn\} \le e^{-\ep­silon^2pn/2} \en­space . \] How does one prove bounds like this?

Re­mem­ber Markov's In­equal­ity: For a non-neg­a­tive ran­dom vari­able $X$, \[ \Pr\{X\ge t\E[X]\} \le 1/t \en­space . \] Markov's In­equal­ity was easy to prove. How does it com­pare? Let's use Markov's In­equal­ity in the for­mat we've seen for Cher­noff's bounds: \[ \Pr\{B \ge (1+\ep­silon)pn\} \le \frac{1}{1+\ep­silon} \en­space . \] Hmmm. For small val­ues of $\ep­silon$, that's nearly use­less.

So let's see how we can prove some­thing like Cher­noff's bounds. The first thing we need is to un­der­stand what hap­pens when we mul­ti­ply two ran­dom vari­ables

Prod­ucts of In­de­pen­dent Ran­dom vari­ables

Lin­ear­ity of ex­pec­ta­tion has been a pow­er­ful tools to study the sum of two (or more) ran­dom vari­ables. Is there an equiv­a­lent tool for study­ing the prod­uct of two ran­dom vari­ables?

In gen­eral, the an­swer is no, un­less the two vari­ables in ques­tion are in­de­pen­dent:

Lemma: If $X$ and $Y$ are two in­de­pen­dent ran­dom vari­ables, then \[ \E[X\times Y] = \E[X]\times\E[Y] \en­space . \]

Proof: \begin{align*} \E[X\times Y] & = \sum_{x}\sum_{y} x\times y \times\Pr\{X=x\cap Y=y\} \\ & = \sum_{x}\sum_{y} x\times y \times\Pr\{X=x\}\times\Pr\{Y=y\} & \text{(be­cause X and Y are in­de­pen­dent)} \\ & = \sum_{x}x\times \Pr\{X=x\}\times \sum_{y} y \times\Pr\{Y=y\} \\ & = \left(\sum_{x}x\times \Pr\{X=x\}\right)\times \left(\sum_{y} y \times\Pr\{Y=y\}\right) \\ & = \E[X]\times\E[Y] \en­space . \end{align*}

Please never for­get that this re­quires that $X$ and $Y$ are in­de­p­dent. It's not true in gen­eral.

Am­pli­fy­ing Markov's In­equal­ity

Cher­noff's bril­liant idea is to study a dif­fer­ent ran­dom vari­able $Y$ that is de­fined in terms of $B$. Let \[ Y = e^{\alpha B} \] and we'll pick a value of $\alpha$ later. The idea now is that if $B$ is a lit­tle-bit big­ger than its ex­pected value, then $Y$ will be way big­ger than its ex­pected value, so Markov's In­equal­ity, ap­plied to $Y$ will give some­thing more use­ful.

Re­mem­ber that $B$ is the sum of $n$ in­de­pen­dent Bernoulli$(p)$ ran­dom vari­ables, $X_1,\ldots,X_n$, so \[ Y = e^{\alpha X_1+\alpha X_2 +\cdots+\alpha X_n} = e^{\alpha X_1}\times e^{\alpha X_2} \times\cdots\times e^{\alpha X_n} \en­space . \] Now, $X_1,\ldots,X_n$ are in­de­pen­dent, so that means $e^{\alpha X_1},\ldots,e^{\alpha X_n}$ are in­de­pen­dent. We have the prod­uct of in­de­pen­dent ran­dom vari­ables, so we can fig­ure out its ex­pected value. \[ \E[Y] = \E[e^{\alpha X_1}]\times \E[e^{\alpha X_2}] \times\cdots\times \E[e^{\alpha X_n}] = (\E[e^{\alpha X_1}])^n \] Good, let's fig­ure out $\E[e^{\alpha X_1}]$: \[ \E[e^{\alpha X_1}] = e^{0}(1-p) + e^{\alpha}p = 1+p(e^{\alpha}-1) \] Now we do some­thing an­noy­ingly com­mon. We use the in­equal­ity $1+x \le e^x$: \[ \E[e^{\alpha X_1}] = 1+p(e^{\alpha}-1) \le e^{p(e^{\alpha}-1)} \] And we get \[ \E[Y] \le e^{np(e^{\alpha}-1)} \en­space . \]

Fi­nally, we're ready to go: \begin{align*} \Pr\{B\ge (1+\ep­silon)np\} & = \Pr\{Y\ge e^{\alpha(1+\ep­silon)np}\} \\ & \le \frac{\E[Y]}{e^{\alpha(1+\ep­silon)np}} \\ & \le \frac{e^{np(e^{\alpha}-1)}}{e^{\alpha(1+\ep­silon)np}} \\ & = \left(\frac{e^{e^{\alpha}-1}}{e^{\alpha(1+\ep­silon)}}\right)^{np} \\ \end{align*} Now we still get to pick $\alpha$. If we take $\alpha=\ln(1+\ep­silon)$, we get \[ \Pr\{B\ge (1+\ep­silon)np\} \le \left(\frac{e^{\ep­silon}}{(1+\ep­silon)^{(1+\ep­silon)}}\right)^{np} \] This isn't ex­actly the ver­sion of Cher­noff bounds we've seen (it's ac­tu­ally slightly stronger). With a lit­tle bit of bound­ing, and by as­sump­ing $0< \ep­silon <1$ you can get the ver­sion we're used to.

That's how you get the tail bound, you use sym­me­try: You use the tail bound on a bi­no­mial$(1-p, n)$ ran­dom vari­able.

Sum­mary

What we just did is to use Cher­noff's bound­ing method for bound­ing the sum of $n$ in­de­pen­dent ran­dom vari­ables $X_1,\ldots,X_n$:

  1. De­fine $B=X_1+\cdots+X_n$
  2. De­ter­mine (an upper bound on) $\mu_i=\E[e^{\alpha X_i}]$ for each $i\in\{1,\ldots,n\}$.
  3. Now con­sider the ran­dom vari­able $Y=e^{\alpha B}$
  4. Now use Markov: \[ \Pr\{B \ge k\} = \Pr\{Y \ge e^{\alpha k}\} \le \frac{\E[Y]}{e^{\alpha k}} = \frac{\mu_1\mu_2\cdots\mu_n}{e^{\alpha k}} \]

There are lots of vari­a­tions (even using dif­fer­ent val­ues of $\alpha$ for dif­fer­ent $X_i$). It's only lim­ited by how messy the cal­cu­la­tions get and by how dif­fi­cult it is to bound $\mu_i$.