Michael and William are planning for a bike packing trip for 7 days. As they each pack their bags, they are trying to decide the number of extra tire tubes they should take.
Suppose there is a probability \(p=1/5\)
of tire tube failure per day of ride given the trip condition. For simplicity, assume that tire tube failures occur one at a time per person and the failure events are independent with fixed \(p\)
regardless of the tube usage.
Let \(T_i\)
be the number of days until \(i\)
th tube failure for a person. Based the assumptions made, we have
$$T_i\sim \text{Geom}\left(\frac{1}{5}\right).$$
Having studied probability, Michael and Wiliam make their decision based on \(T_i\)
.
Michael considers \(X_1=T_1 - 1 +T_2\)
, which describes the number of days until one of tubes currently on his bike fails ($T_1$) plus the number days until an extra tube installed on the first failure day fails ($-1+T+2$). He then computes \(E\left[X_1\right]\)
.
Based on linearity of expectations, Michael computes
$$E\left[X_1\right]=E\left[T_1-1+T_2\right]=E\left[T_1\right] + E\left[T_2\right] - 1.$$
Note that \(E[T_i]=1/p=5\)
because they are geometric random variables. This leads to
$$E\left[X_1\right]=9.$$
Based on the result, Michael concludes that one extra tube will fail on the 9th day and it should be sufficient for a 7-day trip.
1 | 2 | … | |
---|---|---|---|
\\T_2\\ | |||
1 | `\(p_1(0)p_2(0)\)` | `\(p_1(1)p_2(0)\)` | `\(\cdots\)` |
2 | `\(p_1(1)p_2(0)\)` | `\(p_1(1)p_2(1)\)` | `\(\ddots\)` |
… | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` |
William also considers \(X_1\)
, but he recognizes that \(E\left[X_1\right]\)
is an expected value and a random variable may take values that are less than its expectation. Instead, William decides to compute the probability that a single extrat tube will fail before or on the 7th day, \(P\left(X_1\le 7\right)=P\left(T_1+T_2\le 8\right)\)
.
To compute the probability, William considers the joint probability mass function of \(T_1\)
and \(T_2\)
shown in Table 1. Note that the joint probability mass function is \(p_1(t_1)p_2(t_2)\)
, where \(p_1\)
and \(p_2\)
are the probability mass functions of \(T_1\)
and \(T_2\)
, respectively, because \(T_1\)
and \(T_2\)
are independent.
Recognizing that each cell of the probability mass table represents a single outcome that are disjoint, we can compute the probabilty \(P\left(T_1+T_2\le 8\right)\)
by adding the probabilities of the oucomes that satisfy \(T_1+T_2\le 8\)
.
$$P\left(T_1+T_2\le 8\right) = \sum_{(a,b):a + b \le 8} p_1(a)p_2(b)$$
Suppose \(T_1+T_2=c\)
for some interval value \(c\)
between 2 and 8. First fixing the value of \(T_2=b\)
for \(1\le b \le c-1\)
, we need \(T_1= c -b\)
.
$$\sum_{(a,b):a + b \le 8} p_1(a)p_2(b) =\sum_{c = 2}^8\left[\sum_{b=1}^7 p_1\left(c-b\right)p_2(b)\right]$$
Using the probability mass function of a geometric random variable with \(p=1/5\)
, we have
$$\begin{align*}&\sum_{c = 2}^8\left[\sum_{b=1}^{c-1} p_1\left(c-b\right)p_2(b)\right] \\ =&\sum_{c = 2}^8\left[\sum_{b=1}^{c-1} \left(\frac{4}{5}\right)^{c-b-1}\frac{1}{5}\cdot\left(\frac{4}{5}\right)^{b-1}\frac{1}{5}\right] \\ =& \sum_{c = 2}^8\left[\sum_{b=1}^{c-1} \left(\frac{4}{5}\right)^{c-2}\left(\frac{1}{5}\right)^2\right] \\ =& \sum_{c = 2}^8 \left(c-1\right) \left(\frac{4}{5}\right)^{c-2}\left(\frac{1}{5}\right)^2 \\ =& 0.4967 \end{align*}$$
Based on the result, William recognizes that there is almost 0.5 chance of the extra tube failing before the trip ends.
We can generalize the method used to compute the distribution of the sum for any two independent discrete random variables.
Let \(X\)
and \(Y\)
be two independent discrete random variables, with probability mass functions \(p_X\)
and \(p_Y\)
, respectively. Then the probability mass function \(p_Z\)
of \(Z = X+Y\)
satisfies
$$p_Z(z) = \sum_j p_X\left(c-b_j\right)p_Y\left(b_j\right),$$
where the sum runs over all possible values \(b_j\)
of \(Y\)
.
Note that in our previous example, we considered the sum only between \(1 \le b \le c-1\)
since \(p_1(c-b)=0\)
when \(b > c - 1\)
.
Suppose \(X\)
and \(Y\)
are two independent random variables with
$$X\sim\text{Bin}\left(n_1,p\right)\quad\text{and}$$
$$Y\sim\text{Bin}\left(n_2,p\right).$$
What is the probability mass function of \(Z=X+Y\)
? Using the theorem, we may consider
$$p_Z(c)=\sum_{b=0}^c p_X\left(c-b\right)p_Y\left(b\right)$$
$$=\sum_{b=0}^c\left\{\binom{n_1}{c-b}p^{c-b}\left(1-p\right)^{n_1-c+b}\cdot\binom{n_2}{b}p^b\left(1-p\right)^{n_2-b}\right\}.$$
On the other hand, recall that a binomial random variable is a sum of independent and identically distributed Bernoulli random variables. In other words, we can rewrite \(X=\sum_{i=1}^{n_p}W_i\)
and \(Y=\sum_{j=1}^{n_2} W_j\)
where \(W_i\)
are independent \(\text{Ber}(p)\)
random variables.
$$X + Y = \sum_{i=1}^{n_1}W_i + \sum_{j=1}^{n_2}W_j = \sum_{k=1}^{n_1+n_2}W_k.$$
Therefore, \(Z\sim \text{Bin}\left(n_1+n_2,p\right)\)
.
We can show a similar result for two independent Poisson random variables. That is, \(M\sim\text{Pois}\left(\lambda_1+\lambda_2\right)\)
if \(M=N_1+N_2\)
with \(N_1\sim\text{Pois}\left(\lambda_1\right)\)
and \(N_2\sim\text{Pois}\left(\lambda_2\right)\)
.
This example is adopted from Section 11.2 in Dekking et al. (2005).
Consider \(Z=X+Y\)
, where \(X\)
and \(Y\)
are two independent \(\text{U}(0,1)\)
random variables. We are interested in computing the cumulative distribution function of \(Z\)
.
To compute \(F_Z(a)=P(Z\le a)\)
, we need to consider 2 disjoint subintervals, \(0\le a <1\)
and \(1\le a <2\)
, where the probability density function \(f_Z\)
is positive. The two cases are shown in Figure @(fig:sumunif).
Figure 1 illustrates that the positive density region of \((X,Y)\)
is a square of size 1. We can also show that the joint density is uniform across the region. Therefore, the area of the shaded subregions in each case is the probability of subregion.
$$F_Z(a)=\begin{cases} 0 & \text{when } a < 0\\ \frac{a^2}{2} & \text{when } 0\le a < 1\\ 1 - \frac{(2-a)^2}{2} & \text{when } 1 \le a < 2\\ 1 & \text{when } a \ge 2. \end{cases}$$
We can then differentiate the cumulative distribution function to obtain the probability density function.
$$f_Z(z)=\begin{cases} z & \text{when } 0 \le z < 1\\ 2 - z & \text{when } 1 \le z <2 \\ 0 & \text{otherwise.} \end{cases}$$
It is not always possible/easy to compute probabilities based on the area of the subregions. For the general case, we can consider the definition of the cumulative distribution function of a sum of two independent random variables analytically.
For \(Z=X+Y\)
, where \(X\)
and \(Y\)
are two continuous random variables with joint density function \(f_{X,Y}\)
, we have
$$F_Z(a) = P(Z\le z) = P(X+Y\le a)$$
$$=\int_{-\infty}^\infty \int_{-\infty}^a - y f_{X,Y}(x,y) dxdy.$$
When \(X\)
and \(Y\)
are inpdenpendent, we can replace \(f_{X,Y}(x,y)\)
with \(f_X(x)f_Y(y)\)
.
$$=\int_{-\infty}^\infty \int_{-\infty}^{a-y} f_X(x) f_Y(y)dxdy$$
$$=\int_{-\infty}^\infty \left(\int_{-\infty}^{a-y} f_X(x) dx\right) f_Y(y)dy$$
$$=\int_{-\infty}^\infty F_X\left(a-y\right)f_Y(y)dy$$
We can then differentiate \(F_Z\)
to obtain the following result.
Let \(X\)
and \(Y\)
be two independent continuous random variables, with probability density functions \(f_X\)
and \(f_Y\)
, respectively. Then the probability density function \(f_Z\)
of \(Z=X+Y\)
is given by
$$f_Z(z) = \int_{-\infty}^\infty f_X(z-y)f_Y(y) dy$$
for \(-\infty < z <\infty\)
.
Following similar steps for \(Z=XY\)
, where \(X\)
and \(Y\)
are independent continuous random variables, we can obtain the following result.
Let \(X\)
and \(Y\)
be two independent continuous random variables with probability density functions \(f_X\)
and \(f_Y\)
, respectively. Then the probability density function \(f_Z\)
of \(Z=XY\)
satisfies
$$f_Z(z) = \int_{-\infty}^\infty f_X\left(z/y\right)f_Y(y)\frac{z}{\left|y\right|}dy$$
for \(-\infty < z <\infty\)
.
Consider \(Z=X+Y\)
where \(X\)
and \(Y\)
are independent \(N\left(\mu,\sigma^2\right)\)
random variables. Applying the formula for sum of independent continuous random variables, we have
$$f_{Z}(a)=\int_{-\infty}^\infty f_{X}(a-y)f_{Y}(y)dy$$
$$=\int_{-\infty}^\infty \left(\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{1}{2\sigma^2}\left(a-x-\mu\right)^2}\right)\left(\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{1}{2\sigma^2}\left(x-\mu\right)^2}\right)dx$$
$$=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{1}{2\sigma^2}\left(2\mu^2-2a\mu\right)}\int_{-\infty}^\infty \left(\frac{1}{\sqrt{2\pi\sigma^2}}\right)^2e^{-\frac{1}{2\sigma^2}\left(2x^2+a^2-2ax\right)}dx$$
$$=\cdots=\frac{1}{\sqrt{2\pi \sigma^2}}e^{-\frac{1}{2\sigma^2}\left(\frac{4\mu^2-4a\mu + a^2}{2}\right)}\int_{-\infty}^\infty \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{1}{\sigma^2}\left[\sqrt{2}\left(x^2 - a^2/2\right)\right]^2} dx$$
$$=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{1}{2\cdot(2\sigma^2)}\left(a-2\mu\right)^2}\frac{1}{\sqrt{2}}\int_{-\infty}^\infty \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{1}{\sigma^2}t^2}dt$$
Note that \(\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{1}{\sigma^2}t^2}\)
is a normal probability density function and integrating over the real line results in 1. The remaining part \(\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{1}{2\cdot(2\sigma^2)}\left(a-2\mu\right)^2}\frac{1}{\sqrt{2}}\)
is also anormal probability function.
Specifically, we have \(Z\sim N\left(2\mu, 2\sigma^2\right)\)
. We can generalize the result as below.
Let \(X\)
and \(Y\)
be two normal random variables with means \(\mu_1\)
and \(\mu_2\)
, and variance \(\sigma_1^2\)
and \(\sigma_2^2\)
, respectively. Then \(X+Y\)
also follows a normal distribution.
When \(X\)
and \(Y\)
are independent, \(X+Y\)
follows the following distribution:
$$N\left(\mu_1+\mu_2, \sigma_1^2 + \sigma_2^2\right).$$
When \(X\)
and \(Y\)
are dependent, \(X+Y\)
follows the following distribution:
$$N\left(\mu_1+\mu_2, \sigma_1^2 +\sigma_2^2 + \text{Cov}(X,Y) \right).$$
The distribution is normal even when the variables are not independent.
Suppose \(X_1\)
, \(X_2\)
, …, \(X_n\)
are independent and identically distributed \(N\left(\mu, \sigma^2\right)\)
.
We are interested in the distribution of
$$\overline{X}=\frac{\sum_{i=1}^nX_i}{n}.$$
Without deriving the cumulative distribution function, we have
$$E\left[\overline{X}\right]=E\left[\frac{X_1}{n} + \frac{X_2}{n} + \cdots +\frac{X_n}{n}\right]$$
$$=\sum_{i=1}^nE\left[\frac{X_i}{n}\right]=n\cdot \frac{\mu}{n} = \mu$$
and
$$\text{Var}\left(\overline{X}\right)=\text{Var}\left(\frac{X_1}{n} + \frac{X_2}{n} + \cdots +\frac{X_n}{n}\right)$$
$$=n\cdot \frac{1}{n^2}\text{Var}\left(X_1\right)=\frac{\sigma^2}{n}$$
based on linearity of expectations and variances for independent random variables.
Because \(X_i/n\)
are normal random variables, the sum, \(\overline{X}_n\)
is also normal. Therefore, we have
$$\overline{X}_n\sim N\left(\mu,\frac{\sigma^2}{n}\right).$$
Dekking, Frederik Michel, Cornelis Kraaikamp, Hendrik Paul Lopuhaä, and Ludolf Erwin Meester. 2005. A Modern Introduction to Probability and Statistics: Understanding Why and How. Springer Science & Business Media.