Lecture 4: Continuous Random Variables

STA237: Probability, Statistics, and Data Analysis I

Michael Jongho Moon

PhD Student, DoSS, University of Toronto

Wednesday, May 17, 2023

Example: A broken watch


  • Suppose you find a broken watch with only the hour hand in its place.
  • What is the probability that the watch stopped at exactly 9 o’clock?
  • Assume an equal probability for any position of the hour hand.
  • Let \(H\) be the random variable that represent the position of the hour hand. We want to compute
    \[P\left(H=9\right).\]


  • There are 12 hours on a watch.
  • Is it then \[P\left(H=9\right)=\frac{1}{12}?\]


  • What if the hour hand was off the mark by very small amount?


  • What if the hour hand was exactly half-way between 9 and 10?
  • Position of the hand is a location on a continuous curve.
  • There are infinite number of locations on a continuous curve.
  • \[P(H=9)=0\]
  • \(H\) is an example of a continuous random variable - variables that are uncountable.

Other examples

  • The height of a person. There is no next value after 176.33 cm.
  • Waiting time at a restaurant. We can’t count time in general.
  • Continuous variables are also used to model values that can only be discrete in practice such as a person’s annual income in CAD.
  • How do we define probabilities associated with a continuous random variable?

We use intervals.

Continuous random variable

Similar to a probability mass function, a probability density function uniquely defines (the behaviour of) a continuous random variable.

A random variable \(X\) is continuous if for some function \[f:\mathbb{R}\to\mathbb{R}\] and for any numbers \(a\) and \(b\) with \(a\le b\),

\[P\left(a\le X\le b\right)=\int_a^b f(x) dx.\] The function \(f\) has to satisfy

(i) \(f(x)\ge 0\) for all \(x\), and
(ii) \(\int_{-\infty}^\infty f(x) dx = 1\).

We call \(f\) the probability density function of \(X\) and the value \(f(x)\) is the probability density of \(X\) at \(x\).

\[P\left(a\le X\le b\right)=\int_a^b f(x) dx\]

  • \(f(x)\) is NOT a probability
  • Both a pmf and a pdf uniquely defines a random variable, but a pmf maps to \([0,1]\) and a pdf to \([0,\infty)\)
  • \(f(x)\) can be interpreted as a relative measure of likelihood around \(x\)

Continuous random variable

The definition of a cdf is the same for both discrete and continuous random variables.

The cumulative distribution function \(F\) of a random variable \(X\) is the function

\[F:\mathbb{R}\to [0,1],\]

defined by

\[F(a)=P(X\le a)\quad\] \[\quad\text{for }-\infty<a<\infty.\]

Properties of cumulative distribution functions

  • For continuous random variable \(X\) with pdf \(f\), we have
    \(F_X(a)=P(X\le a)=\int_{-\infty}^a f(x) dx)\).
  • For discrete random variable \(Y\) taking values \(y_i\) with pmf \(p\) and , we have
    \(F_Y(a)=P(Y\le a)=\sum_{y_i\le a}p(y_i)\).
  • A cdf uniquely defines a distribution for both discrete and continuous random variables.
  • Continuous random variables have continuous cdfs.

Continuous random variable

The property provides an alternative definition.

A random variable is called continuous if its cumulative distribution function \(F\) is continuous everywhere.

More on cumulative distribution functions

  • Any cdfs are
    1. non-decreasing,
    2. right-continuous, and
    3. (approaching) 0 on the left end to 1 on the right end

Example: Dekking et al. Quick Exercise 5.1

Suppose a random variable \(X\) is defined by the following probability density function.

\[f(x)=\begin{cases}\frac{1}{2\sqrt{x}} & \text{when }0<x<a \\0 &\text{otherwise}\end{cases}\]

What is \(a\)?

  • \(\int_{-\infty}^\infty f(x) dx =1\)

We know \(F(a)=1\) and \(F(0)=0\). \[\implies \int_{-\infty}^\infty f(x) dx= \int_{0}^a 1\left/(2\sqrt{x})\right. dx\]

  • \(\int_{0}^a 1\left/(2\sqrt{x})\right. dx=1\)
  • \(\int_{0}^a x^{-1/2}\left/2\right. dx=1\)
  • \(\left. x^{1/2}\right|_{0}^a =1\)
  • \(a^{1/2}=1\)
  • \(a = 1\)

Quantile, percentile, and median

Let \(X\) be a continuous random variable and \(p\) a number between 0 and 1. The \(p\)th quantile or \(100\cdot p\)th percentile of the distribution \(X\) is the smallest number \(q_p\) such that

\[F(q_p)=P(X\le q_p)=p.\]

The median of a distribution is its \(50\)th percentile.

Quantile, percentile, and median

The previous definition is ambiguous for discrete random variables since there may not be a value \(q\) that satisfies \(F(q)=p\).

Let \(X\) be a random variable with cumulative distribution function \(F\). Then the quantile function of \(X\) is the function \(F^{-1}\) defined by

\[F^{-1}(t) = \min \left\{x: F(x) \ge t \right\},\]

for \(0<t<1\).

Quantile functions for continuous vs. discrete

Example: A broken watch

We assumed there is an equal likelihood of \(H\) being between 0 and 12.

Its probability density function will be a constant, say \(k\), over the interval from 0 to 12.

What is \(k\)?

The cumulative distribution function will start to increase from 0 at \(H=0\) at a constant rate to reach 1 at \(H=12\). \(F\) is continuous on \(\mathbb{R}\).

Common continuous distributions

Uniform distribution

We use a uniform distribution to assign equal probabilities across a fixed interval.

It often models completely arbitrary experiments, or complete ignorance about the likelihood of outcomes

A continuous random variable has a uniform distribution on interval \([\alpha, \beta]\) if its probability density function \(f\) is given by

\[f(x)=\begin{cases}\frac{1}{\beta-\alpha} & \alpha \le x\le \beta\\ 0 &\text{otherwise.}\end{cases}\]

We denote this distribution by \(U(\alpha,\beta)\).

Uniform distribution

We use a uniform distribution to assign equal probabilities across a fixed interval.

It often models completely arbitrary experiments, or complete ignorance about the likelihood of outcomes

\[Y \sim U(\alpha, \beta)\]

Example: Air duct cleaning scam calls

Suppose Michael receives approximately \(r\) air duct cleaning scam calls every year.

Let the random variable \(T\) be the amount of time between two consecutive calls.

To compute the distribution of \(T\), we model the calls as a Poisson process …

  • divide 1 year into \(n\) equal-length intervals
  • make the intervals small enough that Michael may receive only 1 call per \(1/n\)-year interval
  • assume whether Michael receives a call during a particular \(1/n\)-year interval is identical and independent from each other

Example: Air duct cleaning scam calls

\[\vdots\]

To compute the distribution of \(T\), we model the calls as a Poisson process …

  • divide 1 year into \(n\) equal-length intervals
  • make the intervals small enough that Michael may receive only 1 call per \(1/n\)-year interval
  • assume whether Michael receives a call during a particular \(1/n\)-year interval is identical and independent from each other

Then, \(p_n=r/n\) represents the probability of getting a scam call in any \(1/n\)-year interval.

\[P(T>t\text{ years})\]

  • \(=P(T>t\times n\times1/n\text{-year intervals})\)
  • \(=\left(1-p_n\right)^{t\cdot n}\)
  • \(=\left(1-\frac{r}{n}\right)^{t\cdot n}\)

Let \(n\to\infty\).

  • \(P(T>t\text{ years})\)
    \(= \lim_{n\to\infty}\left(1 - r\cdot\frac{1}{n}\right)^{t\cdot n}\)
  • \(=e^{-t\cdot r}\)

To compute \(F_T(t)\), we can use

\[F_T(t)=P(T\le t)=1-e^{-rt}.\]

Taking its derivative gives its pdf.

\[f_T(x) = \frac{d}{dx} \left(1-e^{-rx}\right) = re^{-rx}\]

\(T\) is an example of an exponential random variable.

Exponential distribution

Exponential random variables are often used to model time until the next event in a Poisson process. \(\lambda\) is the expected rate of events .

A continuous random variable has an exponential distribution with parameter \(\lambda\), \(\lambda>0\), if its probability density function \(f\) is given by

\[f(x) = \begin{cases} \lambda e^{-\lambda x} & x\ge0\\ 0 & \text{otherwise.} \end{cases}\]

We denote this distribution by \(\text{Exp}(\lambda)\).

Exponential distribution

Exponential random variables are often used to model time until the next event in a Poisson process. \(\lambda\) is the expected rate of events .

While \(F_Y\) is everywhere, \(f_Y\) is discontinuous at \(0\).

\[Y \sim \text{Exp}(1)\]

Example: Customer arrivals

(Adopted from Devore & Berk)

Let \(X\) be the time (hr) between two successive arrivals at the drive-up window of a local bank. Suppose \(X\) has an exponential distribution with \(\lambda=\lambda_0\).

What is the probability that no customer showing up for first 2 hours after opening?

Suppose 2 hours have passed since opening without a customer. What is the probability that no customer shows up for the next 2 hours?

The probability that no customer shows up in the first 2 hours is

  • \[P(X>2)\]
  • \(=1-F(2)\)
  • \(=1-\int_0^2 \lambda_0 e^{-\lambda_0x} dx\)
  • \(=1-\left.e^{-\lambda_0 x}\right|_0^2\)
  • \(=1-e^{-\lambda_00}+e^{-\lambda_02}\)
  • \(=e^{-2\lambda_0}\)

The probability that no customer shows up for next 2 hours after no customer showed up for the first 2 hours is

  • \[P(X>4 | X>2)\]
  • \(=\frac{P(\{X>4\}\cap\{X>2\})}{P(X>2)}\)

because \(\{X>4\}\) implies \(\{X>2\}\).

  • \(=\frac{P(X>4)}{P(X>2)}=\frac{e^{-4\lambda_0}}{e^{-2\lambda_0}}=e^{-2\lambda_0}\)

\(P(X>4|X>2)=P(X>2)\)

Whether there was a customer in the past 2 hours does not change the probability of a customer’s arrival in the next 2 hours.

Memoryless property of exponential random variables

For any \(s,t>0\),

\[\begin{align} & P(X>s + t | X>s) \\ = & \frac{P(X>s + t)}{P(X>s)} \\ = & \frac{1-\left(1-e^{-\lambda(s+t)}\right)}{1-\left(1-e^{\lambda s}\right)} \\ =&\frac{e^{-\lambda s}e^{-\lambda t}}{e^{-\lambda s}} \\ = & P(X>t)\end{align}\]

The timing of a past event does not change the probability of the timing for the next event.

Gamma distribution

\(\Gamma(\cdot)\) is called the gamma function and \(\Gamma(n)=(n-1)!\) when \(n\) is a positive integer.

A continuous random variable has a gamma distribution with parameter \(\alpha\) and \(\beta\), \(\alpha>0\) and \(\beta>0\), if its probability density function \(f\) is given by

\[f(x)=\frac{1}{\Gamma(\alpha)}\beta^\alpha x^{\alpha-1}e^{-\beta x}\quad\text{for }x>0.\]

We denote this distribution by \(\text{Gamma}(\alpha, \beta)\).

Gamma distribution

\(\Gamma(\cdot)\) is called the gamma function and \(\Gamma(n)=(n-1)!\) when \(n\) is a positive integer.

Gamma distribution is more versatile in comparison to exponential distribution with two parameters. It is used to model insurance claim amounts, rainfalls, etc.

\[G\sim \text{Gamma}(\alpha, \beta)\]

Normal distribution

Normal distribution, or Gaussian distribution, is central in probability theory and statistics.

It is often used to model observational errors.

A continuous random variable has a normal distribution with parameter \(\mu\) and \(\sigma^2\), \(\sigma^2>0\), if its probability density function \(f\) is given by

\[f(x)=\frac{1}{\sigma\sqrt{2\pi}}\exp\left\{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2\right\}.\]

We denote the distribution by \(N(\mu,\sigma^2)\).

Normal distribution

Normal distribution, or Gaussian distribution, is central in probability theory and statistics.

It is often used to model observational errors.

Normal distributions have a symmetric shape around its centre.

\(\mu\) controls the center of the distribution (location) while the \(\sigma\) controls the spread of the distribution (shape).

\[X_{\mu,\sigma} \sim N(\mu, \sigma^2)\]

Standard normal distribution

Standard normal distribution is a special case of normal distribution.

We can transform any normal random variable \(X\sim N(\mu, \sigma^2)\) to \(Z\) by \[Z = \frac{X-\mu}{\sigma}.\]

A normal distribution with \(\mu=0\) and \(\sigma^2=1\) is called the standard normal distribution.

We often denote a standard normal random variable by \(Z\), \(Z\sim N(0,1)\), its pdf with \(\phi\), and its cdf with \(\Phi\).

\[\phi(z) = \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}z^2}\]

\[\Phi(a) = \int_{-\infty}^a\frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2}z^2}dz\]

Standard normal distribution

There is no explicit solution for \(F(a)=\int_{-\infty}^a f(a) dx\).

To compute probabilities for any normal random variable, we can

  1. transform the variable to \(Z\) and use a look-up table for \(\Phi\) (sometimes \(1-\Phi\)), or
  2. use R or similar.

\[Z = \frac{X-\mu}{\sigma}.\]

Example: Computing probabilities of a normal random variable

Suppose \(X\sim N(1, 4^2)\). Find

  1. \(P(X > 2)\)
  2. \(P(X\le 0)\)
  3. \(q_{0.25}\)

normal table

\(X\sim N(1, 4^2)\)

\[P(X > 2)\]

  • \(= P(X\ge2)\)

\(P(X=2)=0\)

  • \(= P\left(\frac{X-1}{4}\ge\frac{2-1}{4}\right)\)
  • \(= P\left(Z\ge\frac{1}{4}\right)\)
  • \(\approx 0.4013\)

normal table

\(X\sim N(1, 4^2)\)

\[P(X > 2)\]

  • \(= P(X\ge2)\)

\(P(X=2)=0\)

  • \(= P\left(\frac{X-1}{4}\ge\frac{2-1}{4}\right)\)
  • \(= P\left(Z\ge\frac{1}{4}\right)\)
  • \(\approx 0.4013\)
1 - pnorm(1/4) # standard normal
[1] 0.4012937
1 - pnorm(2, mean = 1, sd = 4) # normal with mu 1 and sigma 4
[1] 0.4012937

\(X\sim N(1, 4^2)\)

\[P(X \le 0)\]

  • \(= P(X\le0)\)
  • \(= P\left(Z\le-\frac{1}{4}\right)\)
  • \(= P\left(Z\ge\frac{1}{4}\right)\)

\(Z\) is symmetric around \(0\).

  • \(\approx 0.4013\)

normal table

\(X\sim N(1, 4^2)\)

\[P(X \le 0)\]

  • \(= P(X\le0)\)
  • \(= P\left(Z\le-\frac{1}{4}\right)\)
  • \(= P\left(Z\ge\frac{1}{4}\right)\)

\(Z\) is symmetric around \(0\).

  • \(\approx 0.4013\)
pnorm(-1/4) # standard normal
[1] 0.4012937
pnorm(0, mean = 1, sd = 4) # normal with mu 1 and sigma 4
[1] 0.4012937

\(X\sim N(1, 4^2)\)

\[q_{0.25}\]

  • \(0.25= F(q_{0.25})=P(X\le q_{0.25})\)
  • \(0.25= P\left(Z\le \frac{q_{0.25} - 1}{4}\right)\)
  • \(0.25= P\left(Z\ge -\frac{q_{0.25} - 1}{4}\right)\)
  • \(\implies \frac{1-q_{0.25}}{4}\approx0.675\)
  • \(q_{0.25} = -1.7\)

normal table

\(X\sim N(1, 4^2)\)

\[q_{0.25}\]

  • \(0.25= F(q_{0.25})=P(X\le q_{0.25})\)
  • \(0.25= P\left(Z\le \frac{q_{0.25} - 1}{4}\right)\)
  • \(0.25= P\left(Z\ge -\frac{q_{0.25} - 1}{4}\right)\)
  • \(\implies \frac{1-q_{0.25}}{4}\approx0.675\)
  • \(q_{0.25} = -1.7\)
1 - qnorm(0.75) * 4 # standard normal
[1] -1.697959
qnorm(0.25, mean = 1, sd = 4) # normal with mu 1 and sigma 4
[1] -1.697959

R worksheet

Install learnr and run R worksheet

  1. Click here to install learnr on r.datatools.utoronto.ca

  2. Follow this link to open the worksheet



If you see an error, try:

  1. Log in to r.datatools.utoronto.ca
  2. Find rlesson04 from Files pane
  3. Click Run Document

Other steps you may try:

  1. Remove any .Rmd and .R files on the home directory of r.datatools.utoronto.ca
  2. In RStudio,
    1. Click Tools > Global Options
    2. Uncheck “Restore most recently opened project at startup”
  3. Run install.packages("learnr") in RStudio after the steps above or click here

Summary

  • Continuous random variables describe uncountable random outcomes using probabilities of intervals
  • Probability density function and cumulative distribution function uniquely define the behaviour of a random variable
  • Common continuous random variables include exponential and normal
  • Standard normal distribution is a special case of normal distribution

Practice questions

Chapter 5, Dekking et al.

  • Read Section 5.4

  • Quick Exercises 5.1, 5.6, 5.7

  • All exercises from the chapter

  • See a collection of corrections by the author here