Lecture 3: Discrete Random Variables

STA237: Probability, Statistics, and Data Analysis I

Michael Jongho Moon

PhD Student, DoSS, University of Toronto

Monday, May 15, 2023

Example: Rolling two fair dice

  • Suppose we are interested in the sum
  • We may define 12 different events,
    e.g., \(S_2\), \(S_3\), … \(S_{12}\)
  • It would be more efficient to study a variable that takes the possible values of the sum according to their probabilities, e.g., \[X\in \left\{2,3,\ldots, 11,12\right\}\]
  • \(X\) is an example of a discrete random variable

Discrete random variable

flowchart LR
    event((<font size=4em>Outcome))---fn(<font color=#386CB0><font size=4em>Random <br />Variable)-->value((<font color=#386CB0><font size=4em>Real<br />Number))
    style event fill: #ffffff, stroke: #696969, stroke-width: 2px;
    style fn fill: #ffffff, stroke: #696969, stroke-width: 2px;
    style value fill: #ffffff, stroke: #696969, stroke-width: 2px;

Let \(\Omega\) be a sample space. A random variable \(X\) is a function that maps \(\Omega\) on to real number (\(\mathbb{R}\)), \[X:\Omega\to\mathbb{R}.\]


When a random variable \(X\) takes a countable number of values, it is called a discrete random variable.

Note that a discrete random variable may take an infinite number of values.

Example: Rock paper scissors

Suppose you play Rock Paper Scissors with a housemate. Assume both you and your friend pick each hand randomly.


In each round, the winner takes $1, $2, and $3 from the loser when the winner wins with a rock, paper, and scissors respectively.

When the round is a tie, you bot put $3 in a communal cash box that you share with other housemates. i.e., you both lose $3.

What is the probability that you win more than $5 after playing 3 rounds?

Consider a single round

  • Let \(X_i\) be the amount you earn in \(i\)th round and denote each round’s outcome with
    \((\) your hand, your housemate’s hand \()\).

  • e.g., \(X_i(\{\) (, ) \(\})=3\).

  • \(X_i\) maps each possible outcome from a round to a dollar amount.

Your hand

Their hand


\[-3\]

\[2\]

\[-1\]

\[-2\]

\[-3\]

\[3\]

\[1\]

\[-3\]

\[-3\]

\(X_i\) is random

  • Because the underlying mechanism is random, the value it takes is also random.

  • The underlying experiment also determines the probability associated with each possible value \(X_i\) can take.

Your hand

Their hand


\[-3\]

\[2\]

\[-1\]

\[-2\]

\[-3\]

\[3\]

\[1\]

\[-3\]

\[-3\]

\(X_i\) is random

  • Because the underlying mechanism is random, the value it takes is also random.

  • The underlying experiment also determines the probability associated with each possible value \(X_i\) can take.

\[x\]

\[-3\]

\[-2\]

\[-1\]

\[1\]

\[2\]

\[3\]

\[P\left(X=x\right)\]

\[\frac{4}{9}\]

\[\frac{1}{9}\]

\[\frac{1}{9}\]

\[\frac{1}{9}\]

\[\frac{1}{9}\]

\[\frac{1}{9}\]

\[P\left(X=x\right)=\begin{cases} \frac{4}{9} & x=-3 \\ \frac{1}{9} & x\in\left\{-2, -1, 1, 2, 3\right\} \\ 0 & \text{otherwise.} \end{cases}\]

We can omit \(\{\) inside probability functions with random variables.

\(X_i\) is random

  • Because the underlying mechanism is random, the value it takes is also random.

  • The underlying experiment also determines the probability associated with each possible value \(X_i\) can take.

  • The probability function defined for all possible values of a random variable describes the relative likelihoods or their distribution.

  • Such probability function is called probability mass function.

\[x\]

\[-3\]

\[-2\]

\[-1\]

\[1\]

\[2\]

\[3\]

\[P\left(X_i=x\right)\]

\[\frac{4}{9}\]

\[\frac{1}{9}\]

\[\frac{1}{9}\]

\[\frac{1}{9}\]

\[\frac{1}{9}\]

\[\frac{1}{9}\]

\[P\left(X_i=x\right)=\begin{cases} \frac{4}{9} & x=-3 \\ \frac{1}{9} & x\in\left\{-2, -1, 1, 2, 3\right\} \\ 0 & \text{otherwise.} \end{cases}\]

We can omit \(\{\) inside probability functions with random variables.

Probability mass function

The probability mass function (pmf) uniquely defines (the behaviour of) a random variable.

The probability mass function \(p\) of a discrete random variable \(X\) is the function \[p:\mathbb{R}\to\left[0,1\right],\]

defined by

\[p(k)=P\left(X=k\right)\quad\] \[\quad\text{for }-\infty<k<\infty.\]

Example: Rock paper scissors

  • Let \(S=\sum_{i=1}^3 X_i\).
  • We are interested in \[P\left(S>5\right).\]
  • \(P\left(S>5\right)\)
  • \(=P\left(S=6\right)\)
  • \(\phantom{=}+P\left(S=7\right)\)
  • \(\phantom{=}+P\left(S=8\right)\)
  • \(\phantom{=}+P\left(S=9\right)\)


There are \(9^3\) equally likely outcomes from playing 3 rounds.

\(S=6\) when you win by one of
\((\$2, \$2, \$2)\),
\(3!\) arrangements of \((\$3, \$2, \$1)\).

\(S=7\) when you win by one of
\(3!/2!\) arrangements of \((\$3, \$3, \$1)\),
\(3\) arrangements of \((\$3, \$2, \$2)\).

\(S=8\) when you win by one of
\(3\) arrangements of \((\$3, \$3, \$2)\)

\(S=9\) when you win by \((\$3, \$3, \$3)\)

\[P(S>5)=\frac{7 + 6 + 3 + 1}{9^3}=\frac{17}{729}\]

Cumulative distribution function

The cumulative distribution function (cdf) also uniquely defines (the behaviour of) a random variable.

The cumulative distribution function, or distribution function \(F\) of a random variable \(X\) is the function

\[F:\mathbb{R}\to\left[0,1\right],\]

defined by

\[F(a)=P\left(X\le a\right)\quad\] \[\quad\text{for }-\infty < a<\infty.\]

Example: Rock paper scissors

\[P\left(S>5\right)\]

  • Once we study the full distribution and derive its cdf, \(F_S\left(s\right)\), we can compute the probability quickly.

\[F_\color{red}{S}\left(\color{blue}{s}\right)\]

  • \(\color{red}{S}\): The random variable of interest.
  • \(\color{blue}{s}\): The input value to the function.
  • \[P\left(S>5\right)=1-P\left(S\le5\right)=1-F_S\left(5\right)\]


Common discrete distributions

Bernoulli distribution

\(\theta\) is commonly written as \(p\) as it represents a probability. We will use \(\theta\) to avoid confusion with pmf.

A discrete random variable \(X\) has a Bernoulli distribution with parameter \(\theta\), \(0\le \theta\le 1\), if its probability mass function is given by

\[p_X(x)=\begin{cases} \theta & \text{when }x=1 \\ 1-\theta & \text{when }x=0.\end{cases}\]

We denote the distribution by \(\text{Ber}(\theta)\) and define the random variable by \(X\sim \text{Ber}(\theta)\).

Examples

  • \(Y=1\) when Michael answers a multiple choice question correctly and \(Y=0\) otherwise
  • Let \(W=1\) when \(S>5\) and \(0\) otherwise in the rock paper scissors example
  • Success vs. failure
  • True vs. false
  • Exists vs. does not exist

In general, we can model experiments with exactly two possible outcomes with Bernoulli random variables.

Example: Rock paper scissors

Suppose you want ‘all-or-nothing’,
propose a 5-round game, and
play only scissors.

Assume that your friend still plays randomly.

What is the probability that you would win more than $0 after the 5 rounds?


$3


$3


$3


$3


$3

Let \(W_i\) be the \(\text{Ber}(\theta)\) random variable representing whether you win round \(i\).

We can denote the number of rounds you win as \(N=W_1 + W_2 + W_3 + W_4 + W_5\). You need to at least 3 rounds to win more than $0.

\[P(N \ge 3)=1 - P(N < 3) = 1 - P(N \le 2) = 1 - F_N(2)\]

Let’s consider \(F_N(2)=p_N(0) + p_N(1) + p_N(2)\)

  • \(p_N(2) = P(N=2)\)
  • \(\phantom{P_N(2)}= P\left(\left\{\text{Win 2 and lose 3}\right\}\right)\)
  • \(\phantom{P_N(2)}=\binom{5}{2}P\left(\left\{\left(W_1 \cap W_2 \cap W_3^c \cap W_4^c \cap W_5^c\right)\right\}\right)\)
  • \(\phantom{P_N(2)}=\binom{5}{2}P\left(W_1\right)P\left(W_2\right)P\left(W_3^c\right)P\left(W_4\right)P\left(W_5\right)\)
  • \(\phantom{P_N(2)}=\binom{5}{2}\theta^2\left(1-\theta\right)^3\)

  • \(p_N(2) = \binom{5}{2}\theta^2\left(1-\theta\right)^3\)
  • \(p_N(1) = \cdots =\binom{5}{1}\theta^1\left(1-\theta\right)^4\)
  • \(p_N(0) = \cdots =\binom{5}{0}\theta^0\left(1-\theta\right)^5\)

\[p_N(x) = \binom{5}{x}\theta^x\left(1-\theta\right)^{5-x}\]

Binomial distribution

It’s important to remember the total number of objects, or often referred to as trials, \(n\) is a fixed parameter as well as \(\theta\) (or commonly \(p\) for the probability of event occurring, and sometimes \(q\) for the probability no event).

A discrete random variable \(X\) has a binomial distribution with parameters \(n\) and \(\theta\), \(n = 1, 2, 3, \ldots\), and \(0\le \theta \le 1\), if its probability mass function is given by

\[p_X(x)=\binom{n}{x}\theta^x(1-\theta)^{n-x} \quad\] \[\quad \text{for all }x=0, 1, 2, \ldots, n.\]

We denote the distribution by \(\text{Bin}(n,\theta)\).

Examples

  • \(N\sim\text{Bin}(5, 1/3)\) from the ‘all-or-nothing’ Rock paper scissors example
  • The number of questions Michael answers correctly out of 10 multiple choice questions with similar level of difficulty
  • Number of successes (failures) out of a fixed number of trials

The distribution describes a sum of \(n\) independent and identical Bernoulli trials.

Geometric distribution

The number of experiments (trials) is no longer fixed.

A discrete random variable \(X\) has a geometric distribution with parameter \(\theta\), \(0 < \theta \le 1\), if its probability mass function is given by

\[p_X(x)=(1-\theta)^{x-1}\theta\quad\] \[\quad\text{for } x=1,2,\ldots.\]

We denote this distribution by \(\text{Geo}(\theta)\).

Examples

  • Number of rock, paper, and scissors games you play until you win
  • Number of rock, paper, and scissors games you play until you lose
  • Number of trials until the first success (failure)

The number of identical Bernoulli trials until the first event.

Example: Whale population


Suppose you take a high-resolution satellite picture the ocean and divide the picture into 9 lots as shown. The chance of one or more whales being captured in each lot is \(4/9\).

Observing whales in each lot is independent and all have the same probability.

Let \(Y_9\) be the number of lots with whales out of 9 lots.

\[ Y_9\sim\text{Binom}\left(9, 4/9\right) \]

Example: Whale population


You are interested in studying the the number of whales you capture in each picture, \(X\).

You realize there could be more than one whale in each lot and decide to make the lots smaller and smaller until they can fit only one per lot.

Assume the proportion of lots with whales remain constant at \(4/9\).

Q. How many lots do you need?

A. A very large number.

Let’s consider a simpler case of dividing an 1D interval

\[\vdots\]

  • Assume each interval experiences success with the same probability, \(p_n\), and independently each time you divide the segment.
  • The (expected) rate of success, \(\lambda=n \cdot p_n\), remains the same in length as you divide the segment into \(n\) intervals.
  • For each \(n\), the number of successful intervals, \(X_n\sim\text{Bin}\left(n,p_n\right)\).

Let’s consider a simpler case of dividing an 1D interval

  • If we take \(n\) to \(\infty\) (and drop the subscript \(n\) from \(p_n\) to simplify the notation), we get
    \(\lim_{n\to\infty}p_{X_n}\left(x\right)=\lim_{n\to\infty}\binom{n}{x}p^x(1-p)^{n-x}\)
  • \(\phantom{\lim_{n\to\infty}p_{X_n}\left(x\right)}=\lim_{n\to\infty}\left[\frac{n!}{x!\left(n-x\right)!}\left(\lambda\left/n\right.\right)^x\left(1-\lambda\left/n\right.\right)^{n-x}\right]\)
  • \(\phantom{\lim_{n\to\infty}p_{X_n}\left(x\right)}=\frac{\lambda^x}{x!}\cdot\lim_{n\to\infty}\left[\frac{n!}{\left(n-x\right)!n^x}\left(1-\lambda\left/n\right.\right)^{n}\left(1-\lambda\left/n\right.\right)^{-x}\right]\)
  • \(\phantom{\lim_{n\to\infty}p_{X_n}\left(x\right)}=\frac{\lambda^x}{x!}\cdot\left[\lim_{n\to\infty}\frac{n!}{\left(n-x\right)!n^x}\right]\cdot\left[\lim_{n\to\infty}\left(1-\lambda\left/n\right.\right)^{n}\right]\) \(\phantom{==\lim_{n\to\infty}p_{X_n}\left(x\right)}\cdot\left[\lim_{n\to\infty}\left(1-\lambda\left/n\right.\right)^{-x}\right]\)
  • \(\phantom{\lim_{n\to\infty}p_{X_n}\left(x\right)=}\vdots\)
  • \(\phantom{\lim_{n\to\infty}p_{X_n}\left(x\right)}=\frac{\lambda^x e^{-\lambda}}{x!}\)

\(\frac{\lambda^x}{x!}\) \(\cdot\)\(\left[\lim_{n\to\infty}\frac{n!}{\left(n-x\right)!n^x}\right]\) \(\cdot\)\(\left[\lim_{n\to\infty}\left(1-\lambda\left/n\right.\right)^{n}\right]\) \(\cdot\)\(\left[\lim_{n\to\infty}\left(1-\lambda\left/n\right.\right)^{-x}\right]\)

  • \(\lim_{n\to\infty}\frac{n!}{\left(n-x\right)!n^x}\)
  • \(=\lim_{n\to\infty}\frac{n(n-1)(n-2)\cdots(n-x+1)}{n^x}\)
  • \(=\lim_{n\to\infty}\frac{n(n-1)(n-2)\cdots(n-x+1)}{n\cdot n\cdots n}\)
  • \(=\lim_{n\to\infty}\frac{n}{n}\cdot\frac{n-1}{n}\cdot\frac{n-2}{n}\cdots\frac{n-x+1}{n}\)
  • \(=1\)
  • \(\lim_{n\to\infty}\left(1-\lambda\left/n\right.\right)^{n}\)

\(e=\lim_{x\to\infty}\left(1 + 1\left/x\right.\right)^x\)

  • \(=\lim_{n\to\infty}\left[\left(1+1\left/(-n/\lambda)\right.\right)^{(-n/\lambda)}\right]^{-\lambda}\)
  • \(=e^{-\lambda}\)
  • \(\left[\lim_{n\to\infty}\left(1-\lambda\left/n\right.\right)^{-x}\right]\)
  • \(=1\)

Poisson distribution

The random variable captures the count of events in a fixed interval of Poisson process.

A discrete random variable \(X\) has a Poisson distribution with parameter \(\lambda\), \(\lambda > 0\), if its probability mass function is given by

\[p_X(x)=\frac{e^{-\lambda}\lambda^x}{x!}\quad\quad\text{for }x=0,1,2,\ldots.\]

We denote the distribution by \(\text{Pois}(\lambda)\).

Poisson process assumptions

  1. The expected rate, \(\lambda\), at which events occur is constant over the interval.
  2. All events are independent of each other.
  3. Events can not occur simultaneously.

Examples of Poisson random variables

  • Number of calls received at a call centre in an hour
  • Number of dandelion flowers in 1 square meter lawn

(Dekking et al. Exercise 12.1)

Which of the following examples would
reasonably suit the Poisson process assumptions?

The times of bankruptcy of enterprises in the United States.

No. They tend to occur in clusters.

The times a chicken lays its eggs.

No. A chicken probably can’t lay a new egg immediately after laying one.

The times of airplane crashes in a worldwide registration.

Yes.

The locations of wrongly spelled words in a book.

Yes.

The times of traffic accidents at a crossroad.

Yes if you assume the accidents are minor and don’t affect the future traffic.
No if you assume major accidents occur and authorities block the traffic.

Example: Customer arrival

(Dekking et al. Exercise 12.2)

The number of customers that visit a bank on a day is modeled by a Poisson distribution. It is known that the probability of no customers at all is 0.00001. What is the expected number of customers?

  • Let \(N\) be the number of customers per day.
  • \(p_N(0)=0.00001\)
  • \(\phantom{P_N(0)}=\frac{e^{-\lambda}\lambda^0}{0!}\)
  • \(\phantom{P_N(0)}=e^{-\lambda}\)

\[\implies \lambda = -\log\left(0.00001\right)=11.513\]

R worksheet

Install learnr and run R worksheet

  1. Click here to install learnr on r.datatools.utoronto.ca

  2. Follow this link to open the worksheet



If you see an error, try:

  1. Log in to r.datatools.utoronto.ca
  2. Find rlesson03 from Files pane
  3. Click Run Document

Other steps you may try:

  1. Remove any .Rmd and .R files on the home directory of r.datatools.utoronto.ca
  2. In RStudio,
    1. Click Tools > Global Options
    2. Uncheck “Restore most recently opened project at startup”
  3. Run install.packages("learnr") in RStudio after the steps above or click here

Summary

  • Discrete random variables describe countable random outcomes
  • Probability mass function and cumulative distribution function uniquely define the behaviour of a random variable
  • Common discrete random variables include binomial and Poisson

Practice questions

Chapter 4, Dekking et al.

  • Quick Exercises 4.3, 4.5, 4.6
  • All exercises from the chapter

Chapter 12, Dekking et al.

  • Quick Exercise 12.1

  • Exercises 12.3, 12.4, 12.5, 12.6

  • See a collection of corrections by the author here