# Lecture 2: Conditional Probability and Independence

STA237: Probability, Statistics, and Data Analysis I

Michael Jongho Moon

PhD Student, DoSS, University of Toronto

Wednesday, May 10, 2023

# Example: 433 lottery winners

• 433 won the grand prize
• The winning numbers were multiples of 9,
09-45-36-27-18-54
• Some people suspected this was due to a fraud ## Must be a fraud because …

### The combination looks suspicious

(9x1, 9x2, 9x3, 9x4, 9x5, 9x6)

### If it was not a fraud …

• The probabilty of drawing
(9, 9x2, 9x3, 9x4, 9x5, 9x6)
from integers between 1 and 55
• $=1\left/\binom{55}{6}\right.\approx 3\left/10^8\right.$
• $=$ The probabilty of drawing
$\phantom{=}$ any combination of 6 numbers.

## Must be a fraud because …

### There are too many winners

• The probabilty of 2 people picking the same combination of 6 numbers
• $=$(# possible combinations)
$\phantom{=}\times P($first person picking the combination$)$
$\phantom{=}\times P($second person picking the combination$)$
• $=\binom{55}{6}\cdot 1\left/\binom{55}{6}\right. \cdot 1\left/\binom{55}{6}\right.\approx3\left/10^8\right.$.
• 433 sharing the same combination is even less likely.

### It assume they all picked the numbers randomly …

• 9 is considered a lucky number by many
• People are more likely select numbers they consider lucky

## Must be a fraud because …

### I played the same sequence multiple times but never won

9 must be a lucky number and should appear more often.

### If each draw is consistently executed …

• Does the probability of drawing (9, 9x2, 9x3, 9x4, 9x5, 9x6) in the next draw change knowing it was drawn today?
• What is the conditional probability of drawing the combination in the next draw given that it was drawn today?
• Do the draws depend on each other? Or, are they independent?

# Conditional probability

In general,

The conditional probability of event $A$ given event $C$ is defined as

$P(A|C)=\frac{P(A\cap C)}{P(C)}$

for any event $C$ such that $P(C)>0$.

Alternatively,

The multiplication rule states that for any events $A$ and $C$,

$P(A\cap C)=P(A|C)\cdot P(C).$

## Example: Sharing a birthday

Suppose 3 students are randomly selected from a class.

What is the probability that all three have different birthdays?

Assume they are all born in a non-leap year.

### Experiment

Picking 3 students randomly.

### Outcome

$(b_1, b_2, b_3)$

### Sample space

$\Omega =\left\{\begin{split} \\ (\text{Jan 1}, &\text{Jan 1}, &\text{Jan 1}), \\ (\text{Jan 1}, &\text{Jan 1}, &\text{Jan 2}), \\ (\text{Jan 1}, &\text{Jan 1}, &\text{Jan 3}), \\ &\quad\vdots& \end{split}\right\}$

Let’s first consider the event that the first two birthdays are both January 1st, $b^*$.

Denote

• event that $b_1= b_2$ with $A$ and
• event that $b_1=b^*$ with $C$.

$P\left(C\right)=\frac{1}{365}$

$P\left(A\cap C\right)=\frac{1}{365^2}$

$P\left(A\left\lvert C\right.\right)=\frac{1}{365^2}\left/\frac{1}{365}\right.=\frac{1}{365}$

$=\frac{\text{# days that is January 1st}}{\text{# possible days for }b_2}$

### Events

$B_{12}=\left\{\left(b_1,b_2,b_3\right):b_1=b_2\right\}$

$B_{13}=\left\{\left(b_1,b_2,b_3\right):b_1=b_3\right\}$

$B_{23}=\left\{\left(b_1,b_2,b_3\right):b_2=b_3\right\}$

### Probability of interest

$P\left(B_{12}^c \cap B_{13}^c \cap B_{23}^c\right)$

The probability of two people sharing a birthday on January 1st is $\frac{1}{365^2}$.

There are 365 disjoint events where two people share a birthday in a year.

$P\left(B_{12}\right)=365 \times \frac{1}{365^2}=\frac{1}{365}$ $\implies P\left(B_{12}^c\right)=1-P\left(B_{12}\right)=\frac{364}{365}$

### Events

$B_{12}=\left\{\left(b_1,b_2,b_3\right):b_1=b_2\right\}$

$B_{13}=\left\{\left(b_1,b_2,b_3\right):b_1=b_3\right\}$

$B_{23}=\left\{\left(b_1,b_2,b_3\right):b_2=b_3\right\}$

### Probability of interest

$P\left(B_{12}^c \cap B_{13}^c \cap B_{23}^c\right)$

$=\color{darkblue}{P\left(B_{13}^c \cap B_{23}^c \left\lvert B_{12}^c\right.\right)}P\left(B_{12}^c\right)$

We can compute $P\left(B_{12}^c \cap B_{13}^c \cap B_{23}^c\right)$ if we know the conditional probability that the third person doesn’t share a birthday with either of the first two given the first pair doesn’t share a birthday.

• $P\left(B_{13}^c \cap B_{23}^c \left\lvert B_{12}^c\right.\right)$
• $=\frac{\text{# days that do not overlap with the first 2}}{\text{# possible days for }b_3}$

We know # days that do not overlap with the first 2 is $365-2$ because we know they don’t share a birthday.

• $=\frac{363}{365}$

### Events

$B_{12}=\left\{\left(b_1,b_2,b_3\right):b_1=b_2\right\}$

$B_{13}=\left\{\left(b_1,b_2,b_3\right):b_1=b_3\right\}$

$B_{23}=\left\{\left(b_1,b_2,b_3\right):b_2=b_3\right\}$

### Probability of interest

$P\left(B_{12}^c \cap B_{13}^c \cap B_{23}^c\right)$

$P\left(B_{12}^c \cap B_{13}^c \cap B_{23}^c\right)$ $=\frac{363}{365}\cdot\frac{364}{365}$ $=\frac{363\times364}{365^2}$

## Example: Guessing a multiple choice question

(adopted from Dekking et al 3.10)

Suppose Michael knows the answer to a multiple choice question with a probability of 3/5.

When he does not know the answer, he picks an answer out of 4 choices at random. Even when Michael knows the answer, he is prone to making mistakes and answers the question correctly with a probability of 4/5.

What is the probability that Michael correctly answers a mutiple choice question?

### Events

$K$: Michael knows the answer

$Y$: Michael answers correctly

### Probabilities

$P(K)=3/5$

$P(Y\left|K^c\right.)=1/4$

$P(Y\left|K\right.)=4/5$

### Knows

$K$

$Y|K$

$Y^c|K$

$Y\cap K$

$K^c$

$Y|K^c$

$Y^c|K^c$

$Y\cap K^c$

$P(Y)=P(Y | K)P(K) + P(Y | K^c)P(K^c)$

# The Law of Total Probability

Suppose $C_1,C_2,\ldots,C_m$ are disjoint events such that $C_1\cup C_2\cup\cdots\cup C_m=\Omega$.

The Law of Total Probability states that

$P(A)=\sum_{i=1}^m\left[P(A\left|C_i\right.)P(C_i)\right]$

for any arbitrary event $A$.

## $P(\left.C_i\right|A)=?$

$P(\left.C_i\right|A)=\frac{P(C_i \cap A)}{P(A)}$

$=\frac{P(A |C_i )P(C_i)}{P(A)}$

$P\left(C_i\cap A\right)=P\left(A\cap C_i\right)=P\left(A |C_i \right)P\left(C_i\right)$

$=\frac{P(A |C_i )P(C_i)}{\sum_{i=1}^m\left[P(A\left|C_i\right.)P(C_i)\right]}$

Law of Total Probability

# Bayes’ Rule

Suppose $C_1,C_2,\ldots,C_m$ are disjoint events such that $C_1\cup C_2\cup \cdots\cup C_m=\Omega$.

Bayes’ Rule states that the conditional probability of $C_i$ given an arbitrary event $A$ is

$P(\left.C_i\right|A)=\frac{P(A\left|C_i\right.)\cdot P(C_i)}{ \sum_{i=1}^m\left[P(A\left|C_i\right.)P(C_i)\right]}.$

## Example: Guessing a multiple choice question

### Applying Bayes’ rule

Provided that Michael answered the question correctly, what is the probability that Michael knew the answer?

$P(K)=3/5$

$P(Y\left|K^c\right.)=1/4$

$P(Y\left|K\right.)=4/5$

• $P\left(K\left\lvert Y\right.\right)$
• $=\frac{P\left(Y\left\lvert K\right.\right)P\left(K\right)}{P\left(Y\left\lvert K\right.\right)P\left(K\right) + P\left(Y\left\lvert K^c\right.\right)P\left(K^c\right)}$
• $=\frac{4/5\cdot3/5}{4/5\cdot3/5+1/4\cdot2/5}$
• $=\frac{24}{29}\approx0.828$

# Independence

What does it mean for two events to be independent?

(Michael answers a question correctly today) & (it rains tomorrow) are independent.

(Michael answers a question correctly) & (Michael gets stuck on a subway delay on the test day) may not be independent.

# Independence

An event $A$ is called independent of $B$ if

$P(A|B)=P(A).$

That is, whether event $B$ occurs or not
does NOT change the probability of $A$.

## Example: Guessing a multiple choice question

$P(K)=\frac{3}{5} < \frac{24}{29} = P(K|Y)$

• Suppose you were Michael’s instructor. Before the exam, your confidence on his knowledge about the question wasn’t too high.
• When you find out he answered the question correctly, you are more confident that he knows the material.
• The correctness of his answer adds extra information about Michael’s level of understanding on the course material.
• If the two events were independent, the question would not be a useful assessment.

## Example: Sampling in R

Consider
samp <- sample(1:10, 5)

Let

• $A$ be the event that samp is 10
• $B$ be the event that samp is 10
• $C$ be the event that samp is 5

Are they pairwise independent?

• $P(B|A)=0$ $\implies$ $A$ and $B$ are not independent.
• $P(C|A)>P(C)$ $\implies$ $A$ and $C$ are not independent.
• $P(C|B)>P(B)$ $\implies$ $B$ and $C$ are not independent.

## Example: Sampling in R

Consider
samp <- sample(1:10, 5, replace = TRUE)

Let

• $A$ be the event that samp is 10
• $B$ be the event that samp is 10
• $C$ be the event that samp is 5

Are they pairwise independent?

• $P(B|A)=P(B)$ $\implies$ $A$ and $B$ are independent.
• $P(C|A)=P(C)$ $\implies$ $A$ and $C$ are independent.
• $P(C|B)=P(B)$ $\implies$ $B$ and $C$ are independent.

### When

$P(A|B)=P(A)$

### Implications

Complements

$P(A|B)=1-P(\left.A^c\right|B)$ and $P(A)=1-P(A^c)$,

$\implies 1-P(\left. A^c\right|B) = 1 - P(A^c)$ $\implies P(A^c|B)=P(A^c)$

Multiplication rule

$P(A\cap B) = P(A|B)P(B)$

$\implies P(A\cap B)=P(A)P(B)$

### When

$P(A|B)=P(A)$

Mutual property

$P(B|A) = \frac{P(A\cap B)}{P(A)}$

$=\frac{P(A)P(B)}{P(A)}=P(B)$

$\implies P(B| A)=P(B)$

$\phantom{a}$ $P(A|B)=P(A)$

$\iff P(A^c|B)=P(A^c)$ $\iff P(A\cap B)=P(A)P(B)$ $\iff P(B| A)=P(B)$

To show that $A$ and $B$ are independent,
it suffices to prove any one of the above.

If you show that any one of them is not true,
you show that the two events are dependent.

## Example: Rolling two fair dice

You roll two fair dice.

### Question 1

$A$ is the event that sum of the rolls is divisible by 4,
$B$ is the event that the two roll are the same.

Are $A$ and $B$ independent events?

# Independence among more than two events

Using the alternative $P(A\cap B) = P(A)P(B)$ definition, we can expand the notion.

Events $A_1,A_2,\ldots,A_m$ are called independent if

$P(A_1\cap A_2\cap \cdots \cap A_m)=\prod_{i=1}^m P(A_i).$

The statement holds when any number of events are replaced by their complements.

## Example: Rolling two fair dice

You roll two fair dice.

### Question 2

$R_1$ is the event the first throw is a 3,
$R_2$ is the event the second throw is a 3.

What is $P(R_1\cap R_2)$?

What is the probability of the event that $n$ consecutive throws are the same number?

# R worksheet

## Install learnr and run R worksheet

1. Click here to install learnr on r.datatools.utoronto.ca

If you see an error, try:

2. Find rlesson02 from Files pane
3. Click Run Document

Other steps you may try:

1. Remove any .Rmd and .R files on the home directory of r.datatools.utoronto.ca
2. In RStudio,
1. Click Tools > Global Options
2. Uncheck “Restore most recently opened project at startup”
3. Run install.packages("learnr") in RStudio after the steps above or click here

# Summary

• Conditional probability describes how two or more events are related in their likelihoods
• Independent events do not change probability of each other when they occur

## Practice questions

Chapter 3, Dekking et al.

• Quick Exercises 3.2, 3.3, 3.8
• All exercises from the chapter except 3.13
• See a collection of corrections by the author here