Suppose you play a dice game with your friend. You each take turns to roll a pair of fair dice.
If the two numbers sum up to be greater than 7, you win \1. If the sum is less than 7, you lose \1. The game ends when either of you roll a sum of exactly 7.
We will first consider a single round. Let \(W\)
represent the event that you win a \1 in a round. The sample space \(\Omega\)
consists of 36 pairs of two numbers between 1 and 6 as shown in Table 1.
First Roll |
||||||
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | |
Second Roll | ||||||
1 | (1,1) | (1,2) | (1,3) | (1,4) | (1,5) | (1,6) |
2 | (2,1) | (2,2) | (2,3) | (2,4) | (2,5) | (2,6) |
3 | (3,1) | (3,2) | (3,3) | (3,4) | (3,5) | (3,6) |
4 | (4,1) | (4,2) | (4,3) | (4,4) | (4,5) | (4,6) |
5 | (5,1) | (5,2) | (5,3) | (5,4) | (5,5) | (5,6) |
6 | (6,1) | (6,2) | (6,3) | (6,4) | (6,5) | (6,6) |
To identify \(W\)
, we only need the sum of the pair in each outcome. Table 2 with the sums reveals that outcomes in the lower-right triangle make up the event \(W\)
.
First Roll |
||||||
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | |
Second Roll | ||||||
1 | 2 | 3 | 4 | 5 | 6 | 7 |
2 | 3 | 4 | 5 | 6 | 7 | 8 |
3 | 4 | 5 | 6 | 7 | 8 | 9 |
4 | 5 | 6 | 7 | 8 | 9 | 10 |
5 | 6 | 7 | 8 | 9 | 10 | 11 |
6 | 7 | 8 | 9 | 10 | 11 | 12 |
By counting the number of outcomes that belong to \(W\)
, we can see that \(P(W)=15/36\)
. More importantly, we only needed the sum of each outcome to compute the probability. Let’s represent the sum with \(S\)
. We can represent \(S\)
as a function of the outcomes as shown in Equation (1).
\(S\)
assigns each outcome to a value in \(\left\{2,3,4,\dots,12\right\}\)
based on the rule specified in Equation (1).
Having specified a variable of interest, we can represent the event of winning in terms of \(S\)
.
$$P(W)=P\left(S>7\right)=\frac{15}{36}$$
Formally, we should write \(P\left(\left\{S>7\right\}\right)\)
instead of \(P(S>7)\)
. However, we drop the curly brackets when the probability function is used with a random variable.
\(S\)
is an example of a discrete random variable. It is random because the underlying phenomenon the variable represents is random. It is discrete because there is a countable number of possible outcomes for the phenomenon.
We will now formally define the discrete random variable.
Let \(\Omega\)
be a sample space. A random variable \(X\)
is a function that maps the sample space \(\Omega\)
to real numbers \(\mathbb{R}\)
. That is,
$$X:\Omega\to\mathbb{R}.$$
When a random variable \(X\)
maps a countable sample space, it is called a discrete random variable.
Note that a countable sample space may contain an infinite number of outcomes. Being countable means that we can count the outcomes in the sample space, mapping each count to a natural number. An example of an uncountable sample space is the interval \([0,1]\)
of real numbers.
When working with a random variable, you are interested in the possible values it can take and the probabilities associated with each value. This is called the distribution of the random variable. The distribution of a discrete random variable can be defined using either a probability mass function (pmf) or a cumulative distribution function (cdf).
The probability mass function of a discrete random variable \(X\)
, denoted with \(p\)
, is the function \(p:\mathbb{R}\to[0,1]\)
, defined by
$$p(x)=P(X=x)\quad\text{for } -\infty<x<\infty.$$
The pmf of a discrete random variable assigns a probability to each possible value.
For \(S\)
from the dice game example, we can represent the probability mass function in a table as shown in Table 3.
s | `\(p=P(S=s)\)` |
---|---|
2 | 1/36 |
3 | 2/36 |
4 | 3/36 |
5 | 4/36 |
6 | 5/36 |
7 | 6/36 |
8 | 5/36 |
9 | 4/36 |
10 | 3/36 |
11 | 2/36 |
12 | 1/36 |
When written as an equation, we also highlight that the probability of \(S\)
taking any value other than \(\{2,3,\dots,12\}\)
is 0.
\end{equation}
An alternative way of uniquely defining a distribution is through a cumulative distribution function.
The cumulative distribution function of a discrete random variable \(X\)
, denoted with \(F\)
, is the function \(F:\mathbb{R}\to[0,1]\)
, defined by
$$F(a)=P(X\le a)\quad\text{for }-\infty<a<\infty.$$
s | `\(p=P(S=s)\)` | `\(F=P(S\le s)\)` |
---|---|---|
2 | 1/36 | 1/36 |
3 | 2/36 | 3/36 |
4 | 3/36 | 6/36 |
5 | 4/36 | 10/36 |
6 | 5/36 | 15/36 |
7 | 6/36 | 21/36 |
8 | 5/36 | 26/36 |
9 | 4/36 | 30/36 |
10 | 3/36 | 33/36 |
11 | 2/36 | 35/36 |
12 | 1/36 | 36/36 |
Table 4 shows the cdf of \(S\)
from the dice game example along side the pmf. We note that we can construct the cdf by adding pmfs in a cumulative manner defined by
Visual inspection of the pmf and cdf of \(S\)
shown in Figure 1 also displays the relationship described by Equation (3). For each value of \(S\)
with positive pmf values, the cdf has a discontinuity that jumps by the magnitude of the corresponding pmf.
The cdf in Figure 1 B) also reveals three properties that hold true in general.
For any random variable \(X\)
, the following properties hold.
\(F\)
is a nondecreasing function. That is, \(F(a)\le F(b)\)
for any two values \(a\)
and \(b\)
such that \(a\le b\)
.
Because \(F(a)\)
is a probability for any value \(a\)
, we have
$$\lim_{a\to-\infty}F(a)=0,\text{ and }\lim_{a\to\infty}F(a)=1.$$
\(F\)
is right-continuous. That is,
$$\lim_{\varepsilon\downarrow0}F(a+\varepsilon)=F(a).$$
We now examine some of the common discrete random variables. We will begin with an example.
Suppose Michael takes a test with 10 multiple choice question. Unfortunately, he has not studied for the test and randomly picks an answer for each question. For each question, there are four options to choose from.
By randomly choosing an answer for each question, Michael has a probability of 1/4 for answering correctly for each question.
When defining a random variable for a single experiment with two distinct outcomes, we often use a Bernoulli random variable.
A discrete random variable \(X\)
has a Bernoulli distribution with parameter \(p\)
, \(0\le p \le1\)
, if its pmf is given by
$$p_X(x)=\begin{cases} p & \text{when }x = 1 \\ 1-p & \text{when }x = 0. \end{cases}$$
We denote the distribution by \(\text{Ber}(p)\)
.
We use \(p_X\)
and \(F_X\)
to highlight that they are the pmf and cdf of the random variable \(X\)
.
For the multiple choice question example, we can define \(Y_i\)
to be 1 when Michael answers question \(i\)
correctly and 0 otherwise. Then, \(Y_i\)
follows the Bernoulli distribution with parameter \(1/4\)
, denoted by \(Y\sim\text{Ber}(1/4)\)
.
To pass the test, Michael needs to answer at least 5 questions correctly. The quantity we are interested in is now the number of correct answers. Let’s represent this quantity with \(X\)
. To compute the pmf \(p_X(x)\)
, we must consider the different ways that Michael can answer \(x\)
questions correctly. For example, there are 10 ways Michael can answer 1 question correctly.
We can think of the number of ways to answer \(n\)
questions correctly out of \(N\)
is equivalent to the number of ways to choose \(n\)
items out of \(N\)
distinct objects without a replacement.
\(1\)
| \(2\)
| \(3\)
| … | \(N\)
\(\mathbf{1}\)
| \(2\)
| … | \(n\)
We will begin by considering the number of ways to arrange \(n\)
items out of \(N\)
distinct objects in order. We start by choosing one item from \(N\)
objects and placing it in the first place. We have \(N\)
options.
\(1\)
| \(2\)
| \(3\)
| … | \(N-1\)
\(\require{cancel}\cancel{1}\)
| \(\mathbf{2}\)
| … | \(n\)
After placing an object, you now have one less object to choose from. Therefore, you now have \(N-1\)
options.
Repeating the process until you place \(n\)
items, you end up with
$$N\cdot(N-1)\cdot(N-2)\cdots(N-n+1)$$
number of ways to arrange the \(n\)
items. Such arrangements are called permutations.
Any ordered sequence of \(n\)
objects taken from a set of \(N\)
distinct objects is called a permutation. The number of possible permutations of size \(n\)
from \(N\)
objects is
$$P_{n,N}=\frac{N!}{(N-n)!}.$$
\(N!\)
denotes \(N\)
factorial and is computed as \(N\cdot(N-1)\cdot(N-2)\cdots2\cdot1\)
for a positive integer \(N\)
and \(0!=1\)
.
For the multiple choice question example, we do not care about the order Michael answers the questions when counting the number of ways Michael answers \(x\)
questions correctly. Therefore, counting using the permutation method would results in overcounting. To compensate, we divide the number of ways to arrange \(n\)
items in order, \(n!\)
.
Any unordered set of \(n\)
objects taken from a set of \(N\)
distinct objects is called a combination. The number of possible combinations of size \(n\)
from \(N\)
objects is
$${N\choose n} =\frac{N!}{(N-1)!n!}.$$
Now that we know how to count the number of possible combinations for any number of correct answers, we can define the pmf for the example. The resulting distribution is called a binomial distribution.
A discrete random variable \(X\)
has a binomial distribution with parameters \(n\)
and \(p\)
, \(n=1,2,3,\dots\)
and \(0\le p\le1\)
, if its pmf is given by
$$p_X(x)={n\choose x}p^x(1-p)^{n-x}\quad\forall x=0,1,2,\ldots, n.$$
We denote the distribution by \(\text{Bin}(n,p)\)
.
We often use the distribution to describe the number of successes out of a fixed number of trials. The distribution describes the sum of \(n\)
independent and identical Bernoulli trials.
Let’s revisit the dice game example and define \(G\)
to be the number of rounds played until the game ends with a roll of 7. To compute the pmf of \(G\)
, \(p_G(x)=P(G=x)\)
, we recognize that a roll of 7 is proceeded by \(x-1\)
rolls that are not 7. Since rolling a 7 in each round has a probability of 1/7, the pmf has the following form:
$$P(G=x)=\left(1-\frac{1}{7}\right)^{(x-1)}\frac{1}{7}.$$
\(G\)
is an example of a geometric random variable.
A discrete random variable \(X\)
has a geometric distribution with parameter \(p\)
, \(0<p \le 1\)
, if its pmf is given by
$$p_X(x)=(1-p)^{(x-1)}p\quad\forall x = 1,2,\ldots$$
We denote the distribution by \(\text{Geo}(p)\)
.
Similar to binomial random variables, each trial must be independent and identical Bernoulli trials for geometric distributions.
With the binomial distribution, we counted the number of successes (or failures) out of a fixed number of trials. With Poisson, we are interested in the number of events in a fixed interval. The distribution can be derived by considering a fixed interval with infinite subintervals and assume a Bernoulli trial takes place in each of the subintervals.
A discrete random variable \(X\)
has a Poisson distribution with parameter \(\lambda\)
, \(\lambda >0\)
, if its pmf is given by
$$p_X(x)=\frac{e^{-\lambda}\lambda^x}{x!}.$$
We denote the distribution by \(\text{Pois}(\lambda)\)
.
For the pmf to hold true, we must make the following assumptions about the underlying random experiment.