Lecture 7: Joint Distribution

STA237: Probability, Statistics, and Data Analysis I

Michael Jongho Moon

PhD Student, DoSS, University of Toronto

June 6, 2022

Example: Coffee shop with muffins

Michael realizes his coffee shop won’t survive selling coffee alone and decides to sell muffins as well.

After a month of selling both coffee and muffins, Michael estimates the probability distribution of the daily coffee and muffin sales as shown on the right.

For example, the probability of selling 5 cups of coffee and 5 muffins on a day would be indicated by …

How about the probability of selling exactly 5 cups of coffee on a day?

How about the probability of selling less than 5 cups of coffee and less than 4 muffins on a day?

This is an example of a joint distribution of two discrete random variables.

The two random variables arise from the same sample space and the joint distribution describe probabilities of all possible pairs of their values.

Joint distribution of discrete random variables

Joint probability mass function

The joint probability mass function \(p\) of two discrete random variables \(X\) and \(Y\) is the function \(p:\mathbb{R}^2\to\left[0,1\right]\), defined by

\[p\left(a,b\right) = P\left(X=a, Y=b\right)\quad\text{for} -\infty<a,b<\infty.\]

To emphasize the random variables, we can write \(p_{X,Y}(a,b)\)
Note that \(X\) and \(Y\) are defined on the same sample space, \(\Omega\)

Example: Two dice

(From Dekking et al. Section 9.1)

Let \(S\) be the sum of two fair dice rolls and \(M\) be the maximum of the two.

Compute the following probabilities.

\(P(S=7,M=5)\)

\(=2/36=1/18\)

Joint probility mass function P(S=s, M=m).
	m
	1	2	3	4	5	6
s
2	1/36	0	0	0	0	0
3	0	2/36	0	0	0	0
4	0	1/36	2/36	0	0	0
5	0	0	2/36	2/36	0	0
6	0	0	1/36	2/36	2/36	0
7	0	0	0	2/36	2/36	2/36
8	0	0	0	1/36	2/36	2/36
9	0	0	0	0	2/36	2/36
10	0	0	0	0	1/36	2/36
11	0	0	0	0	0	2/36
12	0	0	0	0	0	1/36

Example: Two dice

(From Dekking et al. Section 9.1)

Let \(S\) be the sum of two fair dice rolls and \(M\) be the maximum of the two.

Compute the following probabilities.

\(P(S=7,M=5)=1/18\)

\(P(S=7)\)

\(=\left(2+2+2\right)/36=1/6\)

Joint probility mass function P(S=s, M=m).
	m
	1	2	3	4	5	6	P(S=s)
s
2	1/36	0	0	0	0	0	1/36
3	0	2/36	0	0	0	0	2/36
4	0	1/36	2/36	0	0	0	3/36
5	0	0	2/36	2/36	0	0	4/36
6	0	0	1/36	2/36	2/36	0	5/36
7	0	0	0	2/36	2/36	2/36	6/36
8	0	0	0	1/36	2/36	2/36	5/36
9	0	0	0	0	2/36	2/36	4/36
10	0	0	0	0	1/36	2/36	3/36
11	0	0	0	0	0	2/36	2/36
12	0	0	0	0	0	1/36	1/36

Example: Two dice

(From Dekking et al. Section 9.1)

Let \(S\) be the sum of two fair dice rolls and \(M\) be the maximum of the two.

Compute the following probabilities.

\(P(S=7,M=5)=1/18\)

\(P(S=7)=1/6\)

\(P(M=5)\)

\(=\left(2+2+2+2+1\right)/36=1/4\)

Joint probility mass function P(S=s, M=m).
	m
	1	2	3	4	5	6	P(S=s)
s
2	1/36	0	0	0	0	0	1/36
3	0	2/36	0	0	0	0	2/36
4	0	1/36	2/36	0	0	0	3/36
5	0	0	2/36	2/36	0	0	4/36
6	0	0	1/36	2/36	2/36	0	5/36
7	0	0	0	2/36	2/36	2/36	6/36
8	0	0	0	1/36	2/36	2/36	5/36
9	0	0	0	0	2/36	2/36	4/36
10	0	0	0	0	1/36	2/36	3/36
11	0	0	0	0	0	2/36	2/36
12	0	0	0	0	0	1/36	1/36

P(M=m)	1/36	3/36	5/36	7/36	9/36	11/36

Marginal probability mass function

Let \(X\) and \(Y\) be two discrete random variables, with joint probability mass function \(p_{X,Y}\). Then, the marginal probability mass function \(p_X\) of \(X\) can be computed as

\[p_X(x)=\sum_{y}p_{X,Y}\left(x,y\right),\quad\text{and}\]

the marignal probability mass function \(p_Y\) of \(Y\) can be computed as

\[p_Y(y)=\sum_{x}p_{X,Y}\left(x,y\right).\]

Joint cumulative distribution function

The joint cumulative distribution function \(F\) of two random variables \(X\) and \(Y\) is the function \(F:\mathbb{R}^2\to[0,1]\) defined by

\[F\left(a,b\right)=P\left(X\le a, Y, \le b\right)\quad\text{for }-\infty<a,b<\infty.\]

Similar to the case of a single random variable, joint cumulative distribution functions can describe pairs of discrete random variables and pairs of continuous random variables

Example: Two dice

(From Dekking et al. Section 9.1)

Let \(S\) be the sum of two fair dice rolls and \(M\) be the maximum of the two.

\(F_{S,M}\left(6, 2\right)=?\)

Joint probility mass function P(S=s, M=m).
	m
	1	2	3	4	5	6
s
2	1/36	0	0	0	0	0
3	0	2/36	0	0	0	0
4	0	1/36	2/36	0	0	0
5	0	0	2/36	2/36	0	0
6	0	0	1/36	2/36	2/36	0
7	0	0	0	2/36	2/36	2/36
8	0	0	0	1/36	2/36	2/36
9	0	0	0	0	2/36	2/36
10	0	0	0	0	1/36	2/36
11	0	0	0	0	0	2/36
12	0	0	0	0	0	1/36

Example: Two dice

(From Dekking et al. Section 9.1)

Let \(S\) be the sum of two fair dice rolls and \(M\) be the maximum of the two.

\(F_{S,M}\left(6, 2\right)=\left(1+2+1\right)/36=1/9\)

Joint probility mass function P(S=s, M=m).
	m
	1	2	3	4	5	6
s
2	1/36	0	0	0	0	0
3	0	2/36	0	0	0	0
4	0	1/36	2/36	0	0	0
5	0	0	2/36	2/36	0	0
6	0	0	1/36	2/36	2/36	0
7	0	0	0	2/36	2/36	2/36
8	0	0	0	1/36	2/36	2/36
9	0	0	0	0	2/36	2/36
10	0	0	0	0	1/36	2/36
11	0	0	0	0	0	2/36
12	0	0	0	0	0	1/36

Joint distribution of continuous random variables

Joint probability density function

Recall …

A probability density function is a non-negative function
To compute a probability, we need its integral - that is, the area under the curve
Total area under the curve is 1

Extending to a pair …

A joint probability density function is a non-negative function in 2 dimensions
To compute a probability, we need to compute double integrals - that is, the volume under the surface
Total volume under the surface is 1

Joint probability density function

Random variables \(X\) and \(Y\) have a joint continuous distribution if for some function \(f:\mathbb{R}^2\to\mathbb{R}\) and for all numbers \(a_1\), \(a_2\), \(b_1\), and \(b_2\) with \(a_1\le b_1\) and \(a_2\le b_2\),

\[P\left(a_1 \le X\le b_1, a_\le Y\le b_2\right)=\int_{a_2}^{b_2}\int_{a_1}^{b_1} f\left(x,y\right) dx dy.\]

The function \(f\) has to satisfy

\(f\left(x,y\right)\ge 0\) for all \(x\in\mathbb{R}\) and \(y\in\mathbb{R}\); and
\(\int_{-\infty}^\infty\int_{-\infty}^\infty f\left(x,y\right) dxdy = 1\).

We call \(f\) the joint probability density function of \(X\) and \(Y\).

Example: 2.7.8 from Evans and Rosenthal

Suppose \(X\) and \(Y\) have a joint continuous distribution with joint density

\[f_{X,Y}\left(x,y\right)=\begin{cases}120\cdot x^3\cdot y & x\ge 0, y\ge 0, x+y\le1 \\ 0 &\text{otherwise.}\end{cases}\]

Check \(f\left(x,y\right) \ge 0\) for all values of \(x\) and \(y\)
Check \(\int_{-\infty}^\infty\int_{-\infty}^\infty f\left(x,y\right) dxdy = 1\)
Compute \(F_{X,Y}(0.5, 0.5)\) and \(P(X\le 0.5)\).

Example: 2.7.8 from Evans and Rosenthal

\[F_{X,Y}(0.5, 0.5)\]

\(p_{X,Y}(0.5, 0.5)=P(X\le 0.5, Y\le 0.5)\)
\(=\int_{-\infty}^{1/2}\int_{-\infty}^{1/2}f(x,y) dxdy\)
\(=\int_0^{1/2}\int_0^{1/2}120\cdot x^3\cdot y dxdy\)
\(=\int_0^{1/2} 120 \cdot (1/2)^4 / 4 \cdot y dy\)
\(=30 \cdot {1/2}^4 \cdot (1/2)^2 / 2\)
\(=15/64\)

Example: 2.7.8 from Evans and Rosenthal

\[P(X\le 0.5)\]

\(P(X\le 0.5)\)
\(=\int_{-\infty}^{???}\int_{-\infty}^{1/2}f(x,y) dxdy\)
\(=\int_{-\infty}^{\infty}\int_{-\infty}^{1/2}f(x,y) dxdy\)

\[\neq\int_0^{1}\int_{0}^{1/2} 120 \cdot x^3 \cdot y dxdy\]

\(f(x,y)\) is not \(120\cdot x^3 \cdot y\) for all values of \(y\) from 0 to 1 when \(x\) is between 0 and 1/2.

Example: 2.7.8 from Evans and Rosenthal

\[P(X\le 0.5)\]

\(P(X\le 0.5)\)
\(=\int_{-\infty}^{???}\int_{-\infty}^{1/2}f(x,y) dxdy\)
\(=\int_{-\infty}^{\infty}\int_{-\infty}^{1/2}f(x,y) dxdy\)

\[=\int_{0}^{1/2}\int_0^{1-x} 120 \cdot x^3 \cdot y dydx\]

We can switch the order of integrals when working with probability density functions and evaluate \(y\) in terms of \(x\).

\[P(X\le 0.5)=\int_{0}^{1/2}\int_0^{1-x} 120 \cdot x^3 \cdot y dydx\]

\[=\int_0^{1/2} 120 \cdot x^3 \cdot \frac{(1-x)^2}{2} dx\] \[=\int_0^{1/2} 60 \cdot \left(x^5-2x^4+x^3\right) dx\]

\[=60 \cdot\left(\frac{(1/2)^6}{6} - \frac{2(1/2)^5}{5}+\frac{(1/2)^4}{4}\right)\] \[=60\cdot \left(\frac{10/64}{60}-\frac{12/16}{60}+\frac{15/16}{60}\right)\]

\[=\frac{5-24+30}{32}=\frac{11}{32}\]

Marginal cumulative distribution functions

Let \(F\) be the joint distribution function of random variables \(X\) and \(Y\). Then the marginal cumulative distribution function of \(X\) is given by

\[F_X\left(a\right)=P\left(X\le a\right)=F\left(a,\infty\right)=\lim_{b\to\infty}F\left(a,b\right)\]

and the marginal cumulative distribution function of \(Y\) is given by

\[F_Y\left(b\right)=P\left(Y\le b\right)=F\left(\infty,b\right)=\lim_{a\to\infty}F\left(a,b\right).\]

In both discrete and continuous cases, marginal distributions describe distributions of an individual random variable from a joint distribution

Marginal probability density function

Let \(X\) and \(Y\) have a joint continuous distribution, with joint density function \(f_{X,Y}\). Then the marginal density \(f_X\) of \(X\) satisfies

\[f_X\left(x\right) = \int_{-\infty}^\infty f_{X,Y}\left(x,y\right) dy\]

for all \(x\in\mathbb{R}\) and the marginal density \(f_Y\) of \(Y\) satisfies

\[f_Y\left(y\right)=\int_{-\infty}^\infty f_{X,Y}\left(x,y\right)dx\]

for all \(y\in\mathbb{R}\).

Independence of random variables

Independence

Recall for events \(A\) and \(B\) …

… if \(P(A)\cdot P(B)=P(A\cap B)\) then they are independent.

For random variables \(X\) and \(Y\) …

Let \(I_A\) and \(I_B\) be intervals that satisfy \(A=\left\{X\in I_A\right\}\) and \(B=\left\{Y\in I_B\right\}\)
When events \(A\) and \(B\) are independent,

\[P\left(\left\{X\in I_A\right\}\right)\cdot P\left(\left\{Y\in I_B\right\}\right)=P\left(\left\{X\in I_A\right\} \cap \left\{Y\in I_B\right\}\right)\]

When the relationship holds for any \(I_A\) and \(I_B\), then \(X\) and \(Y\) are independent random variables

Independent random variables

The random variables \(X\) and\(Y\), with joint distribution function \(F\), are independent if

\[P\left(X\le x, Y\le y\right)=P\left(X\le x\right)\cdot P\left(Y\le y\right),\]

that is,

\[F\left(x,y\right)=F_X\left(x\right)\cdot F_Y\left(y\right)\]

for all possible values \(x\) and \(y\). Random variables that are not independent are called dependent.

Discrete case

\(X\) and \(Y\) are independent when

\(P\left(X\le x, Y\le y\right) = P\left(X\le x\right)P\left(Y\le y\right)\) for all possible values of \(x\) and \(y\)
\(\implies P(X=x, Y=y)= P(X=x)P(Y=y)\) for all possible values of \(x\) and \(y\)

That is, \(p_{X,Y}(x,y)=p_X(x)p_Y(y)\)
for all possible values of \(x\) and \(y\).

Continuous case

\(X\) and \(Y\) are independent when

\(F\left(x, y\right) = F_X\left(x\right)F_Y\left(y\right)\) for all possible values of \(x\) and \(y\)
\(\implies \frac{d}{dx}\frac{d}{dy} F\left(x, y\right) = \frac{d}{dx} F_X\left(x\right)\frac{d}{dy}F_Y\left(y\right)\) for all possible values of \(x\) and \(y\)

That is, \(f_{X,Y}(x,y)=f_X(x)f_Y(y)\)
for all possible values of \(x\) and \(y\).

Independence of more than two variables

For any number of random variables, \(X_1\), \(X_2\), \(X_3\), …, \(X_n\), they are pairwise independent if \(X_j\) and \(X_k\) are independent for all \(j\neq k\), \(1\le j,k \le n\)
For any number of variables, \(X_1\), \(X_2\), \(X_3\), …, \(X_n\), they are independent if \(F\left(x_1,x_2,x_3,\ldots,x_n\right)=\prod_{i=1}^n F_{X_i}\left(x_i\right)\)
You can also write the definition with \(p_{x_i}\) for discrete random variables with joint probability mass function \(p\) or with \(f_{x_i}\) for continuous random variables with joint density function \(f\)
For further details, you can check Section 2.8 from Evans & Rosenthal

Independence under transformation

Let \(X_1\), \(X_2\), \(X_3\), …, \(X_n\) be independent random variables. For each \(i\in\left\{1,2,\ldots,n\right\}\), let \(h_i:\mathbb{R}\to\mathbb{R}\) be a function and define the random variable

\[Y_i=h_i\left(X_i\right).\]

Then \(Y_1\), \(Y_2\), \(Y_3\), …, \(Y_n\) are also independent.

Example: 2.8.7 from Evans & Rosenthal

Let \(X_1\), \(X_2\), \(X_3\), …, \(X_n\) be independent and identically distributed \(U(0,1)\) random variables. Let \(X_{(n)}\) be the maximum value among them.

What is the cumulative distribution function of \(X_{(n)}\)? How about its probability density function?

We want \(F_{X_{(n)}}(x) = P(X_{(n)}\le x)\).

\(X_{(n)}\) is the largest among \(X_1\), \(X_2\), \(X_3\), …, \(X_n\) implying that when \(X_{(n)}\le x\), all \(X_1\), \(X_2\), …, \(X_n\) are less than or equal to \(x\)

\[\left\{X_{(n)}\le x \right\}=\left\{X_1 \le x\right\}\cap\left\{X_2 \le x\right\}\cap\cdots\left\{X_n \le x\right\}\]

Because they represent the same event, their probabilities are the same

\[P\left(X_{(n)}\le x\right)=P\left(X_1\le x, X_2\le x, \cdots, X_n\le x\right)\]

Because \(X_1\), \(X_2\), \(X_3\), …, \(X_n\) are independent, their joint cumulative distribution function is the same as the product of the marginal cumulative distribution functions

\[P\left(X_1\le x, X_2\le x, \cdots, X_n\le x\right)\]

\[=P\left(X_1\le x\right)\cdot P\left(X_2\le x\right)\cdots P\left(X_n\le x\right)\]

Example: 2.8.7 from Evans & Rosenthal

Let \(X_1\), \(X_2\), \(X_3\), …, \(X_n\) be independent and identically distributed \(U(0,1)\) random variables. Let \(X_{(n)}\) be the maximum value among them.

What is the cumulative distribution function of \(X_{(n)}\)? How about its probability density function?

\[P\left(X_1\le x, X_2\le x, \cdots, X_n\le x\right)\]

\[=P\left(X_1\le x\right)\cdot P\left(X_2\le x\right)\cdots P\left(X_n\le x\right)\]

Since all \(X_1\), \(X_2\), \(X_3\), …, \(X_n\) have the same distribution

\[P\left(X_1\le x\right)\cdot P\left(X_2\le x\right)\cdots P\left(X_n\le x\right)\]

\[=P\left(X_1\le x\right)\cdot P\left(X_1\le x\right)\cdots P\left(X_1\le x\right)\]

\(\implies F_{X_{(n)}}(x)=\prod_{i=1}^n F_{X_1}(x)=\left[F_{X_1}\left(x\right)\right]^n=\begin{cases}0 & x<0 \\x^n & x\in[0,1]\\1 &x > 1\end{cases}\)

Practice questions

Read Sections 9.3 and 9.5 from Dekking et al.
Exercises from Dekking et al. Chapter 9: All

Simulation in R worksheet

Follow this link to open the worksheet