COM­P4804: Al­go­rithms II
$\new­com­mand{\R}{\mathbb{R}}\De­clar­e­Math­Op­er­a­tor{\E}{\mathbf{E}}\De­clar­e­Math­Op­er­a­tor{\deg}{deg}$
Con­di­tional Prob­a­bil­ity and De­pen­dence

Here are my orig­i­nal hand-writ­ten notes on this topic.

Con­di­tional Prob­a­bil­ity

The fol­low­ing de­f­i­n­i­tion tells us the con­di­tional prob­a­bil­ity of an event $A$ given some event $B$: \begin{equa­tion} \Pr\{A\mid B\} = \frac{\Pr\{A\cap B\}}{\Pr\{B\}} \label{con­di­tional-i} \end{equa­tion} Con­di­tional prob­a­bil­ity an­swers ques­tions like: "If I tell you that $B$ hap­pened, then what is the prob­a­bil­ity that $A$ also hap­pened?"

Ex­am­ple: Rolling a six-sided die

Roll 1 six-sided die, $D$:

Al­ter­na­tive For­mu­la­tion of Con­di­tional Prob­a­bil­ity

An equiv­a­lent for­mu­la­tion of \eqref{con­di­tional-i} is ob­tained by mul­ti­ply­ing both sides by $\Pr\{B\}$: \begin{equa­tion} \Pr\{A\cap B\} = \Pr\{B\}\times\Pr\{A\mid B\} \label{con­di­tional-ii} \end{equa­tion} Equa­tion \eqref{con­di­tional-ii} is very use­ful be­cause it gives us a for­mula for com­put­ing $\Pr\{A\cap B\}$ and it's also valid when $\Pr\{B\}=0$.

In­de­pen­dence

Two events $A$ and $B$ are in­de­pen­dent if and only if \begin{equa­tion} \Pr\{A\mid B\} = \Pr\{A\} \en­space . \label{in­de­pen­dence-i} \end{equa­tion} If $A$ and $B$ are not in­de­pen­dent, then they are de­pen­dent. Using \eqref{con­di­tional-ii}, we get the fol­low­ing equiv­a­lent de­f­i­n­i­tion of in­de­pen­dence: $A$ and $B$ are in­d­pen­dent if and only if \begin{equa­tion} \Pr\{A\cap B\} = \Pr\{A\}\times\Pr\{B\} \en­space . \label{in­de­pen­dence-ii} \end{equa­tion} Note that this means that (in)de­pen­dence is sym­met­ric: $A$ and $B$ are in­de­pen­dent if and only if $B$ and $A$ are in­de­pen­dent.

Ex­am­ple: Toss­ing two coins

Toss a gold coin and a sil­ver coin: $\Pr\{TT\} = \Pr\{TH\} = \Pr\{HT\} = \Pr\{HH\} = 1/4$.

Are the events $A=\text{"the gold coin is heads"}$ and $B=\text{"the sil­ver coin is heads"}$ in­de­pen­dent? We have $A=\{HT,HH\}$, $B=\{TH,HH\}$, and $A\cap B=\{HH\}$ so \[ \Pr\{A\} = 2/4=1/2 \en­space , \] and \[ \Pr\{\text{gold coin is heads}\mid \text{sil­ver coin is heads}\} = \frac{1/4}{2/4} = 1/2 \en­space . \] So $\Pr\{A\} = \Pr\{A\mid B\}$ and these two events are in­de­pen­dent.

Ex­am­ple: Throw­ing two dice

Roll two dice, $D_1$ and $D_2$.

Let $A$ de­note the event $D_1+D_2=7$ and let $B$ de­note the event $D_1=4$.

From Lec­ture 1, \[ \Pr\{A\} = 6/36 = 1/6 \en­space . \] By de­f­i­n­i­tion \[ \Pr\{B\} = 1/6 \en­space . \] And \[ \Pr\{A\cap B\} = \Pr\{\text{$D_1=4$ and $D_2=3$}\} = 1/36 \en­space . \] So \[ $\Pr\{A\mid B\} = \frac{\Pr\{A\cap B\}}{\Pr\{B\}} = \frac{1/36}{1/6} = 1/6 = \Pr\{A\} \] So these two events are in­de­pen­dent.

Rolling a 4 with the first die does not af­fect the prob­a­bil­ity that the sum of the two dice is 7.

An­other kind of sym­me­try

If $A$ and $B$ are in­de­pen­dent, then so are $A$ and $\over­line{B}=U\set­mi­nus B$: \begin{align*} \Pr\{A\mid \over­line{B}\} & = \frac{\Pr\{A\cap\over­line{B}\}}{\Pr\{\over­line{B}\}} \\ & = \frac{\Pr\{A\}-\Pr\{A \cap B\}}{1-\Pr\{B\}} \\ & = \frac{\Pr\{A\}-\Pr\{A\}\times\Pr\{B\}}{1-\Pr\{B\}} \text{ (using \eqref{in­de­pen­dence-ii})}\\ & = \frac{\Pr\{A\}(1-\Pr\{B\})}{1-\Pr\{B\}} \\ & = \Pr\{A\} \end{align*}

So, in essence, we don't learn any­thing about $A$ when learn whether or not $B$ hap­pened.

In­de­pen­dent Sets of events

We know what it means for two events $A$ and $B$ to be in­de­pen­dent, but we also need a de­f­i­n­i­tion for more than just two events.

We say that a set of events $\{A_1,\ldots,A_n\}$ are in­de­pen­dent if, for every $\{B_1,\ldots,B_r\}\sub­seteq\{A_1,\ldots,A_n\}$, \[ \Pr\{B_1\cap B_2\cap\cdots\cap B_r\} = \Pr\{B_1\}\times\Pr\{B_2\}\times\cdots\times\Pr\{B_r\} \en­space . \] This con­di­tion is time-con­sum­ing to check, so hope­fully it's ob­vi­ous in what­ever ap­pli­ca­tion you're work­ing on.

Ex­am­ple: Se­quen­tial cir­cuits

A se­ries cir­cuit fails when any one of its com­po­nents fail. If each of the $k$ com­po­nents fails in­de­pen­dently with prob­a­bil­ity $p$, then what is the prob­a­bil­ity that the cir­cuit works? \begin{align*} \Pr\{\text{cir­cuit works}\} &= \Pr\{\text{$C_1$ doesn't fail and … and $C_k$ doesn't fail}\} \\ &= (1-\Pr\{\text{$C_1$ fails}\}) \times \cdots\times (1-\Pr\{\text{$C_k$ fails}\}) \\ &= (1-p)^k \end{align*}

Ex­am­ple: Par­al­lel cir­cuits

A par­al­lel cir­cuit fails when all of its com­po­nents fail. If each of the $k$ com­po­nents fails in­de­pen­dently with prob­a­bil­ity $p$, then what is the prob­a­bil­ity that the cir­cuit works? \begin{align*} \Pr\{\text{cir­cuit fails}\} &= \Pr\{\text{$C_1$ fails and … and $C_k$ fails}\} \\ &= p^k \end{align*} So the prob­a­bil­ity that the cir­cuit works is $1-p^k$.

Markov's In­equal­ity

Markov's In­equal­ity: Let $X$ be a ran­dom vari­able that only takes on non-neg­a­tive val­ues. Then, for any $t>1$, \[ \Pr\{X \ge t\E[X]\} \le \frac{1}{t} \]

The proof of Markov's In­equal­ity is ba­si­cally: "If it were not true, then $\E[X]$ would be big­ger than $\E[X]$". The fol­low­ing proof for­mal­izes this:

Proof: Let \[ Y = \begin{cases}0 & \text{if $X\le t\E[X]$}\\ t\E[X] & \text{oth­er­wise} \end{cases} \] No­tice that $Y\le X$ so $\E[Y] \le \E[X]$. But we can com­pute $\E[Y]$ eas­ily, giv­ing \[ \E[X] \ge \E[Y] = t\E[X]\times\Pr\{X\ge t\E[X]\} \en­space , \] and di­vid­ing both sides by $t\E[X]$ gives, \[ 1/t \ge \Pr\{X\ge t\E[X]\} \en­space , \] as re­quired. ∎

An Ex­am­ple ap­pli­ca­tion

Sup­pose we have a ran­dom­ized al­go­rithm $\math­cal{A}$ whose ex­pected run­ning-time on in­puts of size $n$ is at most $f(n)$. Since run­ning-times are non-neg­a­tive, Markov's In­equal­ity tells us that \[ \Pr\{\text{$\math­cal{B}$ runs for longer than $t f(n)$}\} \le 1/t \en­space . \] But we can do bet­ter! Cre­ate a new al­go­rithm $\math­cal{B}$ de­fined as fol­lows:

We as­sume that $\math­cal{A}$ makes new ran­dom choices each time it is restarted so that, for ex­am­ple, the prob­a­bil­ity that the first run of $\math­cal{A}$ runs longer than $2f(n)$ is in­de­pen­dent of the prob­a­bil­ity that the sec­ond run of $\math­cal{A}$ runs for longer than $2f(n)$. Then, by Markov's In­equal­ity and the in­de­pen­dence of the runs of $\math­cal{A}$, we have for any even in­te­ger $t$, \[ \Pr\{\text{$\math­cal{B}$ runs for longer than $t f(n)$}\} \le (1/2)^{t/2} \en­space . \]

Imag­ine a nu­clear re­ac­tor con­trol sys­tem that will cause a melt­down if the un­der­ly­ing al­go­rithm runs longer than $500f(n)$. Then the best we can say about $\math­cal{A}$ is \[ \Pr\{\text{melt­down using $\math­cal{A}$}\} \le \frac{1}{500} \en­space . \] But for $\math­cal{B}$ we have \[ \Pr\{\text{melt­down using $\math­cal{B}$}\} \le 2^{-250} = \frac{1}{180925139433306555349329664076074856020734351040063381311652475­0­1­2­3­6­4­2­6­5­062} \] Which would you rather rely on?