STA237: Probability, Statistics, and Data Analysis I

Michael Jongho Moon

PhD Student, DoSS, University of Toronto

Wednesday, May 17, 2023

- Suppose you find a broken watch with only the hour hand in its place.
- What is the probability that the watch stopped at exactly 9 o’clock?
- Assume an equal probability for any position of the hour hand.
- Let \(H\) be the random variable that represent the position of the hour hand. We want to compute

\[P\left(H=9\right).\]

- There are 12 hours on a watch.
- Is it then \[P\left(H=9\right)=\frac{1}{12}?\]

- What if the hour hand was off the mark by very small amount?

- What if the hour hand was exactly half-way between 9 and 10?

- Position of the hand is a location on a continuous curve.
- There are infinite number of locations on a continuous curve.
- \[P(H=9)=0\]
- \(H\) is an example of a continuous random variable - variables that are uncountable.

- The height of a person. There is no
*next*value after 176.33 cm. - Waiting time at a restaurant. We can’t count time in general.
- Continuous variables are also used to model values that can only be discrete in practice such as a person’s annual income in CAD.

- How do we define probabilities associated with a continuous random variable?

*We use intervals.*

Similar to a probability mass function, a probability density function uniquely defines (the behaviour of) a continuous random variable.

A random variable \(X\) is **continuous** if for some function \[f:\mathbb{R}\to\mathbb{R}\] and for any numbers \(a\) and \(b\) with \(a\le b\),

\[P\left(a\le X\le b\right)=\int_a^b f(x) dx.\] The function \(f\) has to satisfy

**(i)** \(f(x)\ge 0\) for all \(x\), and

**(ii)** \(\int_{-\infty}^\infty f(x) dx = 1\).

We call \(f\) the **probability density function** of \(X\) and the value \(f(x)\) is the **probability density** of \(X\) at \(x\).

\[P\left(a\le X\le b\right)=\int_a^b f(x) dx\]

- \(f(x)\) is NOT a probability
- Both a pmf and a pdf uniquely defines a random variable, but a pmf maps to \([0,1]\) and a pdf to \([0,\infty)\)
- \(f(x)\) can be interpreted as a relative measure of likelihood
**around**\(x\)

The definition of a cdf is the same for both discrete and continuous random variables.

The **cumulative distribution function** \(F\) of a random variable \(X\) is the function

\[F:\mathbb{R}\to [0,1],\]

defined by

\[F(a)=P(X\le a)\quad\] \[\quad\text{for }-\infty<a<\infty.\]

- For continuous random variable \(X\) with pdf \(f\), we have

\(F_X(a)=P(X\le a)=\int_{-\infty}^a f(x) dx)\). - For discrete random variable \(Y\) taking values \(y_i\) with pmf \(p\) and , we have

\(F_Y(a)=P(Y\le a)=\sum_{y_i\le a}p(y_i)\).

- A cdf uniquely defines a distribution for both discrete and continuous random variables.
- Continuous random variables have continuous cdfs.

The property provides an alternative definition.

A random variable is called **continuous** if its cumulative distribution function \(F\) is *continuous everywhere*.

- Any cdfs are
- non-decreasing,

- right-continuous, and
- (approaching) 0 on the left end to 1 on the right end

- non-decreasing,

Suppose a random variable \(X\) is defined by the following probability density function.

\[f(x)=\begin{cases}\frac{1}{2\sqrt{x}} & \text{when }0<x<a \\0 &\text{otherwise}\end{cases}\]

What is \(a\)?

- \(\int_{-\infty}^\infty f(x) dx =1\)

We know \(F(a)=1\) and \(F(0)=0\). \[\implies \int_{-\infty}^\infty f(x) dx= \int_{0}^a 1\left/(2\sqrt{x})\right. dx\]

- \(\int_{0}^a 1\left/(2\sqrt{x})\right. dx=1\)
- \(\int_{0}^a x^{-1/2}\left/2\right. dx=1\)
- \(\left. x^{1/2}\right|_{0}^a =1\)
- \(a^{1/2}=1\)
- \(a = 1\)

Let \(X\) be a *continuous* random variable and \(p\) a number between 0 and 1. The **\(p\) ^{th} quantile** or

\[F(q_p)=P(X\le q_p)=p.\]

The **median** of a distribution is its \(50\)^{th} percentile.

The previous definition is ambiguous for discrete random variables since there may not be a value \(q\) that satisfies \(F(q)=p\).

Let \(X\) be a random variable with cumulative distribution function \(F\). Then the **quantile function** of \(X\) is the function \(F^{-1}\) defined by

\[F^{-1}(t) = \min \left\{x: F(x) \ge t \right\},\]

for \(0<t<1\).

We assumed there is an equal likelihood of \(H\) being between 0 and 12.

Its probability density function will be a constant, say \(k\), over the interval from 0 to 12.

What is \(k\)?

The cumulative distribution function will start to increase from 0 at \(H=0\) at a constant rate to reach 1 at \(H=12\). \(F\) is continuous on \(\mathbb{R}\).

We use a uniform distribution to assign equal probabilities across a fixed interval.

It often models *completely arbitrary* experiments, or *complete ignorance* about the likelihood of outcomes

A continuous random variable has a **uniform distribution** on interval \([\alpha, \beta]\) if its probability density function \(f\) is given by

\[f(x)=\begin{cases}\frac{1}{\beta-\alpha} & \alpha \le x\le \beta\\ 0 &\text{otherwise.}\end{cases}\]

We denote this distribution by \(U(\alpha,\beta)\).

We use a uniform distribution to assign equal probabilities across a fixed interval.

It often models *completely arbitrary* experiments, or *complete ignorance* about the likelihood of outcomes

\[Y \sim U(\alpha, \beta)\]

Suppose Michael receives approximately \(r\) air duct cleaning scam calls every year.

Let the random variable \(T\) be the amount of time between two consecutive calls.

To compute the distribution of \(T\), we model the calls as a Poisson process …

- divide 1 year into \(n\) equal-length intervals
- make the intervals
**small**enough that Michael may receive only 1 call per \(1/n\)-year interval - assume whether Michael receives a call during a particular \(1/n\)-year interval is identical and independent from each other

\[\vdots\]

To compute the distribution of \(T\), we model the calls as a Poisson process …

- divide 1 year into \(n\) equal-length intervals
- make the intervals
**small**enough that Michael may receive only 1 call per \(1/n\)-year interval - assume whether Michael receives a call during a particular \(1/n\)-year interval is identical and independent from each other

Then, \(p_n=r/n\) represents the probability of getting a scam call in any \(1/n\)-year interval.

\[P(T>t\text{ years})\]

- \(=P(T>t\times n\times1/n\text{-year intervals})\)
- \(=\left(1-p_n\right)^{t\cdot n}\)
- \(=\left(1-\frac{r}{n}\right)^{t\cdot n}\)

Let \(n\to\infty\).

- \(P(T>t\text{ years})\)

\(= \lim_{n\to\infty}\left(1 - r\cdot\frac{1}{n}\right)^{t\cdot n}\) - \(=e^{-t\cdot r}\)

To compute \(F_T(t)\), we can use

\[F_T(t)=P(T\le t)=1-e^{-rt}.\]

Taking its derivative gives its pdf.

\[f_T(x) = \frac{d}{dx} \left(1-e^{-rx}\right) = re^{-rx}\]

\(T\) is an example of an exponential random variable.

Exponential random variables are often used to model time until the next event in a Poisson process. \(\lambda\) is the expected rate of events .

A continuous random variable has an **exponential distribution** with parameter \(\lambda\), \(\lambda>0\), if its probability density function \(f\) is given by

\[f(x) = \begin{cases} \lambda e^{-\lambda x} & x\ge0\\ 0 & \text{otherwise.} \end{cases}\]

We denote this distribution by \(\text{Exp}(\lambda)\).

While \(F_Y\) is everywhere, \(f_Y\) is discontinuous at \(0\).

\[Y \sim \text{Exp}(1)\]

*(Adopted from Devore & Berk)*

Let \(X\) be the time (hr) between two successive arrivals at the drive-up window of a local bank. Suppose \(X\) has an exponential distribution with \(\lambda=\lambda_0\).

What is the probability that no customer showing up for first 2 hours after opening?

Suppose 2 hours have passed since opening without a customer. What is the probability that no customer shows up for the next 2 hours?

The probability that no customer shows up in the first 2 hours is

- \[P(X>2)\]
- \(=1-F(2)\)
- \(=1-\int_0^2 \lambda_0 e^{-\lambda_0x} dx\)
- \(=1-\left.e^{-\lambda_0 x}\right|_0^2\)
- \(=1-e^{-\lambda_00}+e^{-\lambda_02}\)
- \(=e^{-2\lambda_0}\)

The probability that no customer shows up for next 2 hours **after** no customer showed up for the first 2 hours is

- \[P(X>4 | X>2)\]
- \(=\frac{P(\{X>4\}\cap\{X>2\})}{P(X>2)}\)

because \(\{X>4\}\) implies \(\{X>2\}\).

- \(=\frac{P(X>4)}{P(X>2)}=\frac{e^{-4\lambda_0}}{e^{-2\lambda_0}}=e^{-2\lambda_0}\)

\(P(X>4|X>2)=P(X>2)\)

Whether there was a customer in the **past** 2 hours does not change the probability of a customer’s arrival in the **next** 2 hours.

For any \(s,t>0\),

\[\begin{align} & P(X>s + t | X>s) \\ = & \frac{P(X>s + t)}{P(X>s)} \\ = & \frac{1-\left(1-e^{-\lambda(s+t)}\right)}{1-\left(1-e^{\lambda s}\right)} \\ =&\frac{e^{-\lambda s}e^{-\lambda t}}{e^{-\lambda s}} \\ = & P(X>t)\end{align}\]

The timing of a past event does not change the probability of the timing for the next event.

\(\Gamma(\cdot)\) is called the *gamma function* and \(\Gamma(n)=(n-1)!\) when \(n\) is a positive integer.

A continuous random variable has a **gamma distribution** with parameter \(\alpha\) and \(\beta\), \(\alpha>0\) and \(\beta>0\), if its probability density function \(f\) is given by

\[f(x)=\frac{1}{\Gamma(\alpha)}\beta^\alpha x^{\alpha-1}e^{-\beta x}\quad\text{for }x>0.\]

We denote this distribution by \(\text{Gamma}(\alpha, \beta)\).

*gamma function* and \(\Gamma(n)=(n-1)!\) when \(n\) is a positive integer.

Gamma distribution is more versatile in comparison to exponential distribution with two parameters. It is used to model insurance claim amounts, rainfalls, etc.

\[G\sim \text{Gamma}(\alpha, \beta)\]

Normal distribution, or Gaussian distribution, is central in probability theory and statistics.

It is often used to model observational errors.

A continuous random variable has a **normal distribution** with parameter \(\mu\) and \(\sigma^2\), \(\sigma^2>0\), if its probability density function \(f\) is given by

\[f(x)=\frac{1}{\sigma\sqrt{2\pi}}\exp\left\{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2\right\}.\]

We denote the distribution by \(N(\mu,\sigma^2)\).

Normal distribution, or Gaussian distribution, is central in probability theory and statistics.

It is often used to model observational errors.

Normal distributions have a symmetric shape around its centre.

\(\mu\) controls the center of the distribution (location) while the \(\sigma\) controls the spread of the distribution (shape).

\[X_{\mu,\sigma} \sim N(\mu, \sigma^2)\]

Standard normal distribution is a special case of normal distribution.

We can transform any normal random variable \(X\sim N(\mu, \sigma^2)\) to \(Z\) by \[Z = \frac{X-\mu}{\sigma}.\]

A normal distribution with \(\mu=0\) and \(\sigma^2=1\) is called the **standard normal distribution**.

We often denote a standard normal random variable by \(Z\), \(Z\sim N(0,1)\), its pdf with \(\phi\), and its cdf with \(\Phi\).

\[\phi(z) = \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}z^2}\]

\[\Phi(a) = \int_{-\infty}^a\frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2}z^2}dz\]

There is no explicit solution for \(F(a)=\int_{-\infty}^a f(a) dx\).

To compute probabilities for any normal random variable, we can

- transform the variable to \(Z\) and use a look-up table for \(\Phi\) (sometimes \(1-\Phi\)), or
- use R or similar.

\[Z = \frac{X-\mu}{\sigma}.\]

Suppose \(X\sim N(1, 4^2)\). Find

- \(P(X > 2)\)
- \(P(X\le 0)\)
- \(q_{0.25}\)

\[P(X > 2)\]

- \(= P(X\ge2)\)

\(P(X=2)=0\)

- \(= P\left(\frac{X-1}{4}\ge\frac{2-1}{4}\right)\)
- \(= P\left(Z\ge\frac{1}{4}\right)\)
- \(\approx 0.4013\)

\[P(X \le 0)\]

- \(= P(X\le0)\)
- \(= P\left(Z\le-\frac{1}{4}\right)\)
- \(= P\left(Z\ge\frac{1}{4}\right)\)

\(Z\) is symmetric around \(0\).

- \(\approx 0.4013\)

\[q_{0.25}\]

- \(0.25= F(q_{0.25})=P(X\le q_{0.25})\)
- \(0.25= P\left(Z\le \frac{q_{0.25} - 1}{4}\right)\)
- \(0.25= P\left(Z\ge -\frac{q_{0.25} - 1}{4}\right)\)
- \(\implies \frac{1-q_{0.25}}{4}\approx0.675\)
- \(q_{0.25} = -1.7\)

\[q_{0.25}\]

- \(0.25= F(q_{0.25})=P(X\le q_{0.25})\)
- \(0.25= P\left(Z\le \frac{q_{0.25} - 1}{4}\right)\)
- \(0.25= P\left(Z\ge -\frac{q_{0.25} - 1}{4}\right)\)
- \(\implies \frac{1-q_{0.25}}{4}\approx0.675\)
- \(q_{0.25} = -1.7\)

`learnr`

and run R worksheetClick here to install

`learnr`

on r.datatools.utoronto.caFollow this link to open the worksheet

If you see an error, try:

- Log in to r.datatools.utoronto.ca
- Find
`rlesson04`

from*Files*pane - Click
*Run Document*

Other steps you may try:

- Remove any
`.Rmd`

and`.R`

files on the home directory of r.datatools.utoronto.ca - In RStudio,
- Click
`Tools`

>`Global Options`

- Uncheck
*“Restore most recently opened project at startup”*

- Click
- Run
`install.packages("learnr")`

in RStudio after the steps above or click here

- Continuous random variables describe uncountable random outcomes using probabilities of intervals
- Probability density function and cumulative distribution function uniquely define the behaviour of a random variable
- Common continuous random variables include exponential and normal
- Standard normal distribution is a special case of normal distribution

Chapter 5, Dekking et al.

Read Section 5.4

Quick Exercises 5.1, 5.6, 5.7

All exercises from the chapter

See a collection of corrections by the author here

© 2023. Michael J. Moon. University of Toronto.

Sharing, posting, selling, or using this material outside of your personal use in this course is **NOT** permitted under any circumstances.