STA237: Probability, Statistics, and Data Analysis I
PhD Student, DoSS, University of Toronto
Wednesday, May 17, 2023
We use intervals.
Similar to a probability mass function, a probability density function uniquely defines (the behaviour of) a continuous random variable.
A random variable \(X\) is continuous if for some function \[f:\mathbb{R}\to\mathbb{R}\] and for any numbers \(a\) and \(b\) with \(a\le b\),
\[P\left(a\le X\le b\right)=\int_a^b f(x) dx.\] The function \(f\) has to satisfy
(i) \(f(x)\ge 0\) for all \(x\), and
(ii) \(\int_{-\infty}^\infty f(x) dx = 1\).
We call \(f\) the probability density function of \(X\) and the value \(f(x)\) is the probability density of \(X\) at \(x\).
\[P\left(a\le X\le b\right)=\int_a^b f(x) dx\]
The definition of a cdf is the same for both discrete and continuous random variables.
The cumulative distribution function \(F\) of a random variable \(X\) is the function
\[F:\mathbb{R}\to [0,1],\]
defined by
\[F(a)=P(X\le a)\quad\] \[\quad\text{for }-\infty<a<\infty.\]
The property provides an alternative definition.
A random variable is called continuous if its cumulative distribution function \(F\) is continuous everywhere.
Suppose a random variable \(X\) is defined by the following probability density function.
\[f(x)=\begin{cases}\frac{1}{2\sqrt{x}} & \text{when }0<x<a \\0 &\text{otherwise}\end{cases}\]
What is \(a\)?
We know \(F(a)=1\) and \(F(0)=0\). \[\implies \int_{-\infty}^\infty f(x) dx= \int_{0}^a 1\left/(2\sqrt{x})\right. dx\]
Let \(X\) be a continuous random variable and \(p\) a number between 0 and 1. The \(p\)th quantile or \(100\cdot p\)th percentile of the distribution \(X\) is the smallest number \(q_p\) such that
\[F(q_p)=P(X\le q_p)=p.\]
The median of a distribution is its \(50\)th percentile.
The previous definition is ambiguous for discrete random variables since there may not be a value \(q\) that satisfies \(F(q)=p\).
Let \(X\) be a random variable with cumulative distribution function \(F\). Then the quantile function of \(X\) is the function \(F^{-1}\) defined by
\[F^{-1}(t) = \min \left\{x: F(x) \ge t \right\},\]
for \(0<t<1\).
We assumed there is an equal likelihood of \(H\) being between 0 and 12.
Its probability density function will be a constant, say \(k\), over the interval from 0 to 12.
What is \(k\)?
The cumulative distribution function will start to increase from 0 at \(H=0\) at a constant rate to reach 1 at \(H=12\). \(F\) is continuous on \(\mathbb{R}\).
We use a uniform distribution to assign equal probabilities across a fixed interval.
It often models completely arbitrary experiments, or complete ignorance about the likelihood of outcomes
A continuous random variable has a uniform distribution on interval \([\alpha, \beta]\) if its probability density function \(f\) is given by
\[f(x)=\begin{cases}\frac{1}{\beta-\alpha} & \alpha \le x\le \beta\\ 0 &\text{otherwise.}\end{cases}\]
We denote this distribution by \(U(\alpha,\beta)\).
We use a uniform distribution to assign equal probabilities across a fixed interval.
It often models completely arbitrary experiments, or complete ignorance about the likelihood of outcomes
\[Y \sim U(\alpha, \beta)\]
Suppose Michael receives approximately \(r\) air duct cleaning scam calls every year.
Let the random variable \(T\) be the amount of time between two consecutive calls.
To compute the distribution of \(T\), we model the calls as a Poisson process …
\[\vdots\]
To compute the distribution of \(T\), we model the calls as a Poisson process …
Then, \(p_n=r/n\) represents the probability of getting a scam call in any \(1/n\)-year interval.
\[P(T>t\text{ years})\]
Let \(n\to\infty\).
To compute \(F_T(t)\), we can use
\[F_T(t)=P(T\le t)=1-e^{-rt}.\]
Taking its derivative gives its pdf.
\[f_T(x) = \frac{d}{dx} \left(1-e^{-rx}\right) = re^{-rx}\]
\(T\) is an example of an exponential random variable.
Exponential random variables are often used to model time until the next event in a Poisson process. \(\lambda\) is the expected rate of events .
A continuous random variable has an exponential distribution with parameter \(\lambda\), \(\lambda>0\), if its probability density function \(f\) is given by
\[f(x) = \begin{cases} \lambda e^{-\lambda x} & x\ge0\\ 0 & \text{otherwise.} \end{cases}\]
We denote this distribution by \(\text{Exp}(\lambda)\).
Exponential random variables are often used to model time until the next event in a Poisson process. \(\lambda\) is the expected rate of events .
While \(F_Y\) is everywhere, \(f_Y\) is discontinuous at \(0\).
\[Y \sim \text{Exp}(1)\]
(Adopted from Devore & Berk)
Let \(X\) be the time (hr) between two successive arrivals at the drive-up window of a local bank. Suppose \(X\) has an exponential distribution with \(\lambda=\lambda_0\).
What is the probability that no customer showing up for first 2 hours after opening?
Suppose 2 hours have passed since opening without a customer. What is the probability that no customer shows up for the next 2 hours?
The probability that no customer shows up in the first 2 hours is
The probability that no customer shows up for next 2 hours after no customer showed up for the first 2 hours is
because \(\{X>4\}\) implies \(\{X>2\}\).
\(P(X>4|X>2)=P(X>2)\)
Whether there was a customer in the past 2 hours does not change the probability of a customer’s arrival in the next 2 hours.
For any \(s,t>0\),
\[\begin{align} & P(X>s + t | X>s) \\ = & \frac{P(X>s + t)}{P(X>s)} \\ = & \frac{1-\left(1-e^{-\lambda(s+t)}\right)}{1-\left(1-e^{\lambda s}\right)} \\ =&\frac{e^{-\lambda s}e^{-\lambda t}}{e^{-\lambda s}} \\ = & P(X>t)\end{align}\]
The timing of a past event does not change the probability of the timing for the next event.
\(\Gamma(\cdot)\) is called the gamma function and \(\Gamma(n)=(n-1)!\) when \(n\) is a positive integer.
A continuous random variable has a gamma distribution with parameter \(\alpha\) and \(\beta\), \(\alpha>0\) and \(\beta>0\), if its probability density function \(f\) is given by
\[f(x)=\frac{1}{\Gamma(\alpha)}\beta^\alpha x^{\alpha-1}e^{-\beta x}\quad\text{for }x>0.\]
We denote this distribution by \(\text{Gamma}(\alpha, \beta)\).
\(\Gamma(\cdot)\) is called the gamma function and \(\Gamma(n)=(n-1)!\) when \(n\) is a positive integer.
Gamma distribution is more versatile in comparison to exponential distribution with two parameters. It is used to model insurance claim amounts, rainfalls, etc.
\[G\sim \text{Gamma}(\alpha, \beta)\]
Normal distribution, or Gaussian distribution, is central in probability theory and statistics.
It is often used to model observational errors.
A continuous random variable has a normal distribution with parameter \(\mu\) and \(\sigma^2\), \(\sigma^2>0\), if its probability density function \(f\) is given by
\[f(x)=\frac{1}{\sigma\sqrt{2\pi}}\exp\left\{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2\right\}.\]
We denote the distribution by \(N(\mu,\sigma^2)\).
Normal distribution, or Gaussian distribution, is central in probability theory and statistics.
It is often used to model observational errors.
Normal distributions have a symmetric shape around its centre.
\(\mu\) controls the center of the distribution (location) while the \(\sigma\) controls the spread of the distribution (shape).
\[X_{\mu,\sigma} \sim N(\mu, \sigma^2)\]
Standard normal distribution is a special case of normal distribution.
We can transform any normal random variable \(X\sim N(\mu, \sigma^2)\) to \(Z\) by \[Z = \frac{X-\mu}{\sigma}.\]
A normal distribution with \(\mu=0\) and \(\sigma^2=1\) is called the standard normal distribution.
We often denote a standard normal random variable by \(Z\), \(Z\sim N(0,1)\), its pdf with \(\phi\), and its cdf with \(\Phi\).
\[\phi(z) = \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}z^2}\]
\[\Phi(a) = \int_{-\infty}^a\frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2}z^2}dz\]
There is no explicit solution for \(F(a)=\int_{-\infty}^a f(a) dx\).
To compute probabilities for any normal random variable, we can
\[Z = \frac{X-\mu}{\sigma}.\]
Suppose \(X\sim N(1, 4^2)\). Find
\[P(X > 2)\]
\(P(X=2)=0\)
\[P(X \le 0)\]
\(Z\) is symmetric around \(0\).
\[q_{0.25}\]
\[q_{0.25}\]
learnr
and run R worksheetClick here to install learnr
on r.datatools.utoronto.ca
Follow this link to open the worksheet
If you see an error, try:
rlesson04
from Files paneOther steps you may try:
.Rmd
and .R
files on the home directory of r.datatools.utoronto.caTools
> Global Options
install.packages("learnr")
in RStudio after the steps above or click hereChapter 5, Dekking et al.
Read Section 5.4
Quick Exercises 5.1, 5.6, 5.7
All exercises from the chapter
See a collection of corrections by the author here
© 2023. Michael J. Moon. University of Toronto.
Sharing, posting, selling, or using this material outside of your personal use in this course is NOT permitted under any circumstances.