STA237: Probability, Statistics, and Data Analysis I
PhD Student, DoSS, University of Toronto
Wednesday, May 31, 2023
\(\sum x = \sum (x-1) + \sum 1\)
Now suppose the coffee shop’s profit is $\(R\) per day, given by
\[R=g\left(D\right)=\begin{cases} 2D - 10 & 0 \le D < 10 \\ 4D - 30 & D \ge 10 \end{cases}\]
Michael is interested in the distribution of \(R\).
\(\sum_{u\in\mathbb{R}} u = \sum_{u<a} u + \sum_{u\ge a} u\)
\(\sum_{u<a} u = \sum_{u<a} u +\sum_{u\ge a} u-\sum_{u\ge a} u = \sum_{u\in\mathbb{R}} u - \sum_{u \ge a} u\)
\[P(R\ge 20)=?\]
To compute the probability, we need the full distribution.
How transforming a variable changes the distribution.
Recall that for \(X\sim N(\mu, \sigma^2)\) and \(Z\sim N(0,1)\), \[P(X\le x)=P\left(Z\le \frac{x-\mu}{\sigma}\right)\]
… because we know the events \(\{X\le x\}\) and \(\{Z \le (x-\mu)/\sigma\}\) are equivalent events. i.e., we only changed the units in this case.
When computing the distribution of a transformed random variable, we can start by considering the events. i.e., \(\{R\ge 20\}=\{D\ge 12.5\}\).
\[F_{Y}(y)=P(g(X)\le y),\quad Y = g(X)\]
Recall that for \(X\sim N(\mu, \sigma^2)\) and \(Z\sim N(0,1)\), \[P(X\le x)=P\left(Z\le \frac{x-\mu}{\sigma}\right)\]
… because we know the events \(\{X\le x\}\) and \(\{Z \le (x-\mu)/\sigma\}\) are equivalent events. i.e., we only changed the units in this case.
When computing the distribution of a transformed random variable, we can start by considering the events. i.e., \(\{R\ge 20\}=\{D\ge 12.5\}\).
Let \(X\) be a discrete random variable with probability mass function \(p_X\) and \(Y=g\left(X\right)\), where \(g:\mathbb{R}\to\mathbb{R}\) is a function.
Then, \(Y\) is also discrete and its probability mass function \(p_Y\) is defined by
\[p_Y(y)=\sum_{x\in g^{-1}\left\{y\right\}} p_X\left(x\right)\]
where \(g^{-1}\left\{y\right\}\) is the set of all values \(X\) that satisfy \(g\left(x\right)=y\).
Let \(X\) be the outcome of a fair six-sided die roll and \(Y=X^2-3X+2\).
Compute \(P(Y=0)\).
Let \(X\) be the outcome of a fair six-sided die roll and \(Y=X^2-3X+2\).
Compute \(P(Y=0)\).
Note that \(Y=0\) when \(X\in\left\{1, 2\right\}\).
Let \(X\sim \text{U}(0,1)\) and \(Y = g\left(X\right)\), where
\[g(x) = \begin{cases}7 & x\le \frac{3}{4} \\ 5 & x > \frac{3}{4}.\end{cases}\]
\(Y\) is discrete with only 2 possible values whereas \(X\) is continuous.
We can compute the full distribution of \(Y\) by computing the probability massess associated with the 2 values.
\[p_Y(y) = \begin{cases} \frac{1}{4} & y = 5\\ \frac{3}{4} & y = 7 \\ 0 & \text{otherwise}\end{cases}\]
(Example 4.38 from Devore & Berk)
Let \(X\sim \text{Exp}\left(1/2\right)\) and \(Y=g\left(X\right)=60X\). Determine the distribution of \(Y\).
Both \(Y\) and \(X\) are continuous.
\(f_Y(g(x)) \neq f_X(x)\)
Equivalent events share the same probability not density.
\[F_Y(y)=\begin{cases} 1-e^{-y/120} & y > 0 \\ 0 & y \le 0 \end{cases}\]
\[F_Y(y)=\begin{cases} 1-e^{-y/120} & y > 0 \\ 0 & y \le 0 \end{cases}\]
When the \(F_Y\) is continuous and differentiable, we can differentiate \(F_Y\) to get \(f_Y\).
\(f_Y(y)=\begin{cases} \frac{1}{120}e^{-y/120} & y > 0 \\ 0 & y \le 0 \end{cases}\)
\(Y\sim \text{Exp}(1/120)\)
\(60X\) multiplies the unit of measurement by 60 thus, the expected rate is reduced by a factor of \(60\).
It is NOT always possible to get a closed-form of a the transformed \(F_Y\).
Under certain conditions, we may derive \(f_Y\) directly from \(f_X\) based on the Fundamental Theorem of Calculus and the chain rule.
\[\begin{align*} f_Y(y) =& \frac{d}{dy}F_Y(y) \\ =& \left.\frac{d}{dy}F_X\left(x\right)\right|_{x=g^{-1}\left(y\right)} \\ =& \left.\frac{dx}{dy}\frac{d}{dx}F_X\left(x\right)\right|_{x=g^{-1}\left(y\right)} \\ =& \left.\frac{dx}{dy}f_X(x)\right|_{x=g^{-1}\left(y\right)} \end{align*}\]
The absolute value is needed for decreasing \(g\).
Where \(f_X(x)=0\), \(f_Y(y)=0\) and the property of \(g\) doesn’t matter.
Let \(X\) be a continuous random variable with probability density function \(f_X\) and \(Y=g\left(X\right)\), where \(g:\mathbb{R}\to\mathbb{R}\) is a function that is differentiable, and strictly increasing or strictly decreasing at places for which \(f_X(x)>0\).
Then, \(Y\) is also continuous, and its density function \(f_Y\) is defined by
\[f_Y(y)=\left|\frac{d}{dy}h\left(y\right)\right|\cdot f_X\left(h\left(y\right)\right),\]
where \(X=h(Y)\).
Suppose \(X\) is a continuous random variable with pdf \(f\) for some \(\alpha>0\).
\[f(x)=\begin{cases} \frac{\alpha}{x^{\alpha+1}} & x\ge 1 \\ 0 & \text{otherwise}\end{cases}\]
What is the distribution of \(Y=\log\left(X\right)\)?
\(\log(X)\) is strictly increasing when \(X\ge 0\).
\(Y\ge 0\) when \(X\ge 1\).
\[f_Y(y)=\begin{cases}\alpha e^{-\alpha y} & y\ge 0\\ 0 & \text{otherwise}\end{cases}\]
\[\implies Y \sim \text{Exp}\left(\alpha\right)\]
It is a useful tool when you want to compare means of two related random variables without computing the distributions.
Recall …
\[E\left(rX+s\right)=r E\left(X\right) + s,\]
where \(r\) and \(s\) are constants.
When the transformation is NOT linear, we cannot directly calculate the expectation.
We may want to gauge the relative value of the transformed expectation, \(E\left[g\left(X\right)\right]\), compared to the original expectation \(E(X)\).
For a convex function, \(g\), you can gauge the value without computing the distributions or the exact expectation.
A function \(g\) is called convex if for every \(a<b\), the line segment from \((a, g(a))\) to \((b, g(b))\) is on or above the graph of \(g\) on the interval \((a, b)\).
In other words, for \(a<b\), and \(\lambda \in (0,1)\), \[\lambda g(a) + (1-\lambda)g(b)\quad\] \[\quad\ge g\left(\lambda a + \left(1-\lambda\right)b\right)\]
When the line segment is strictly above the graph of \(g\), \(g\) is strictly convex on the interval \((a, b)\).
Let \(g\) be a convex function on interval \(I\), and let \(X\) be a random variable taking values from \(I\). Then Jensen’s inequality states that
\[g\left(E\left[X\right]\right) \le E\left[g\left(X\right)\right].\]
When \(g\) is strictly convex on interval \(I\) and \(X\) is a random variable taking values from \(I\), \(g\left[E\left(X\right)\right] < E\left[g\left(X\right)\right]\) unless \(\text{Var}\left(X\right)=0\).
Recall
\[R=g\left(D\right)=\begin{cases}2D - 10 & \text{when } 0\le D < 10 \\ 4D - 30 &\text{when } D\ge 10\end{cases}\]
\(g(x)\) is convex on \(x \ge 0\).
By Jensen’s inequality, we can deduce
\[E[R]\ge g(E[D]).\]
Thanks to convexity of \(g\), I save almost a cent per day…
Let \(X\) be a random variable with \(\text{Var}(X)>0\). Which of the following two quantities larger?
\[E\left[e^{-X}\right]\quad\text{vs.}\quad e^{-E\left[X\right]}\]
To check convexity of a continuous function, check whether its second derivative is positive.
Let \(X\) be a random variable with \(\text{Var}(X)>0\). Which of the following two quantities larger?
\[E\left[e^{-X}\right]\quad\text{vs.}\quad e^{-E\left[X\right]}\]
To check convexity of a continuous function, check whether its second derivative.
By Jensen’s inequality, \(E\left[e^{-X}\right]\quad>\quad e^{-E\left[X\right]}\).
Histograms are used to visualize distribution of a univariate data.
Steps:
Divide the range of the data into (equal-length) intervals, or bins; the length of each interval is called the bin width
Setting each bin’s height as
the number of data points that fall the interval
the total number of data points \(\times\) bin width
Histograms are used to visualize distribution of a univariate data.
the number of data points that fall the interval
the total number of data points \(\times\) bin width
The heights reflects the relative number of data points that belong to each interval.
Histograms are used to visualize distribution of a univariate data.
the number of data points that fall the interval
the total number of data points \(\times\) bin width
The heights reflects the relative number of data points that belong to each interval.
In a regular histogram, we often display the counts along the y-axis. When we display the relative proportion as computed above, we call the plot a density histogram.
learnr
and run R worksheetClick here to install learnr
on r.datatools.utoronto.ca
Follow this link to open the worksheet
If you seen an error, try:
rlesson06
from Files paneOther steps you may try:
.Rmd
and .R
files on the home directory of r.datatools.utoronto.caTools
> Global Options
install.packages("learnr")
in RStudio after the steps above or click hereChapter 8, Dekking et al.
If you want further reference on histograms, you can read Section 15.1 and Section 15.2 from Dekking et al..
© 2023. Michael J. Moon. University of Toronto.
Sharing, posting, selling, or using this material outside of your personal use in this course is NOT permitted under any circumstances.