## Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Consider a spinning wheel with equally sized slots. A ball spins around the wheel at a constant speed and stops. Suppose you win an award if the ball stops at your slot, denoted with \(x\)
. What happens to the probability of winning as more slots are added?
As you add more slots, the size of each slot decreases and so does the probability of winning. Eventually, it approaches 0.
Realizing the pattern, your friend joined the game not with a single slot but with a fixed area of the circle. The probability of your friend winning has not changed with ever increasing number of slots.
Supposed the number of slots approached infinity. We can not add probabilities of individual slots since the probabilities associated with each slot would be 0.
This is true when we work with continuous random variables whose sample spaces are infinite and not countable. Similar to how sums converge to an integral, we define probabilities of continuous random variables with integrals.
A random variable \(X\)
is continuous if for some function \(f:\mathbb{R}\to\mathbb{R}\)
and for any numbers \(a\)
and \(b\)
with \(a\le b\)
,
$$P(a\le X\le b)=\int_a^b f(x) dx.$$
The function \(f\)
has to satisfy
\(f(x)\ge 0\)
for all \(x\)
, and\(\int_{-\infty}^\infty f(x) dx = 1\)
.We call \(f\)
the probability density function of \(X\)
and the value \(f(x)\)
the probability density of \(X\)
at \(x\)
.
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Figure 3 displays an example of a probability density function, or a pdf. \(P(a\le X\le b)\)
is the area under the curve as highlighted with a shade of gray on the figure.
Note that \(f(x)\neq P(X=x)\)
. Consider \(P(x - \varepsilon \le X\le x + \varepsilon)\)
for any \(x\)
where \(\varepsilon > 0\)
. If we take \(\varepsilon\)
to 0, we have
\begin{equation} \lim_{\varepsilon\to0}P(x - \varepsilon \le X\le x + \varepsilon) = \lim_{\varepsilon\to0}\int_{x - \varepsilon}^{x + \varepsilon} f(u) du = 0 \tag{1} \end{equation}
While a pmf represents a probability, a pdf does not. A pdf maps \(\mathbb{R}\)
to \(\mathbb{R}\)
while a pmf maps \(\mathbb{R}\)
to \([0,1]\)
. This means a pdf can result in a values that is arbitrarily large, which does not fit the definition of a probability. Instead, a probability density can be interpreted as a relative measure of likelihood around a given value compared to other values that the corresponding random variable can take.
Another result of Equation (1) is that
$$P(a\le X\le b)=P(a< X\le b)=P(a< X <b)$$
for any continuous random variable \(X\)
and two values \(a\)
and \(b\)
such that \(a\le b\)
.
The definition of a cdf for a continuous random variable is the same as the definition for a discrete random variable.
The cumulative distribution function \(F\)
of a random variable \(X\)
is the function \(F:\mathbb{R}\to[0,1]\)
, defined by
$$F(a)=P(X\le a) \quad\text{for }-\infty<a<\infty.$$
For a continuous random variable with a pdf \(f\)
, this implies
$$F(a)=\int_{-\infty}^a f(x) dx.$$
A cdf uniquely identifies a distribution for both discrete and continuous random variables. Thus, we can alternatively define a continuous random variable based on \(F\)
.
A random variable is called continuous if its cumulative distribution function \(F\)
is continuous everywhere.
A uniform distribution is an example of a continuous distribution and perhaps what we most associated randomness with in plain language.
A continuous random variable has a uniform distribution on interval \([\alpha,\beta]\)
if its probability density function \(f\)
is given by
$$f(x)=\begin{cases} \frac{1}{\beta-\alpha} & \text{when }\alpha \le x \le \beta \\ 0 & \text{otherwise.} \end{cases}$$
We denote this distribution by \(\text{U}(\alpha, \beta)\)
.
The distribution is used for assigning equal likelihoods across a fixed interval. The distribution often describes a completely arbitrary experiment, or a complete ignorance about the phenomenon.
Figure 4 displays the pdf and cdf of a uniform random variable. We can recognize that it is uniformly distributed from the constant density value across an interval. Furthermore, since a uniform random variable is fully specified by its range, we know that they belong to \(\text{U}(0,1.5)\)
.
A continuous random variable has an exponential distribution* with parameter \(\lambda\)
, \(\lambda>0\)
, if its probability density function \(f\)
is given by
$$f(x)=\lambda e^{-\lambda x}\quad \text{for }x\ge 0.$$
We denote the distribution by \(\text{Exp}(\lambda)\)
.
The exponential distribution is another common distribution used in statistics. The distribution often represents time until the next event in a sequence of events, where the events occur under the same assumptions required for constructing a Poisson distribution. The parameter \(\lambda\)
is called the rate parameter and represents the average number of event occurrences per some interval.
We can easily derive that \(F(a)=1-e^{-\lambda a}\)
for a random variable \(X\sim \text{Exp}(\lambda)\)
by evaluating the integral.
Suppose an exponential random variable \(X\)
with the rate parameter \(\lambda\)
represents the number of hours to a certain event. The probability that the event does not happen for the next 2 hours is
$$P(X>2)=1-P(X\le2)=1-F(2)=e^{-2\lambda}.$$
Now suppose 2 hours have passed without any event since you last computed the probability. What is the probability that the event does not happen for the next 2 hours?
We can compute the probability using the definition of a conditional probability.
$$P(X>(2+2) | X>2)=\frac{P(\{X>4\}\cap \{X>2\})}{P(X>2)}$$
Note that \(\{X>2\}\)
is a subset of \(\{X>4\}\)
.
$$\frac{P(X>4\cap X>2)}{P(X>2)}=\frac{P(X>4)}{P(X>2)}$$
$$=\frac{e^{-4\lambda}}{e^{-2\lambda}}=e^{-2\lambda}=P(X>2).$$
This is a result of the memoryless property of a exponential random variable. In general, we have
$$P(X>s+t|X>t)=P(X>s)$$
for any exponential random variable \(X\)
and two positive values \(s\)
and \(t\)
.
A continuous random variable has a gamma distribution with parameter \(\alpha\)
and \(\beta\)
, \(\alpha>0\)
and \(\beta>0\)
, if its probability density function \(f\)
is given by
$$f(x)=\frac{1}{\Gamma(\alpha)}\beta^\alpha x^{\alpha-1}e^{-\lambda x} \quad\text{for }x>0.$$
We denote this distribution by \(\text{Gamma}(\alpha, \beta)\)
.
\(\Gamma\)
is called the gamma function and for an integer \(n\)
, \(\Gamma(n)=(n-1)!\)
. With \(\alpha=1\)
, the distribution is equivalent to \(\text{Exp}(\beta)\)
.
The gamma distribution generalizes the exponential distribution with an additional parameter that allows more flexibility as demonstrated in Figure 6.
The normal distribution is a distribution that is central to probability theory and statistics. It is often used to model observational errors among others.
A continuous random variable has a normal distribution with parameter \(\mu\)
and \(\sigma^2\)
, \(\sigma^2>0\)
, if its probability density function \(f\)
is given by
$$f(x)=\frac{1}{\sigma\sqrt{2\pi}}\exp\left\{ -\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2\right\}.$$
We denote the distribution by \(N(\mu,\sigma^2)\)
.
The parameter \(\mu\)
represents the center of the distribution and \(\sigma^2\)
represents the spread of the distribution.
There is no explicit solution for the integral \(F(a)=\int_{-\infty}^a f(x) dx\)
for a normal random variable. To compute probabilities associated with a normal random variable, we rely on transformation to the standard normal variable.
A normal distribution with \(\mu=0\)
and \(\sigma^2=1\)
is called the standard normal distribution. We often denote the standard normal variable by \(Z\)
, $ZN(0,1), its pdf with \(\phi\)
, and its cdf \(\Phi\)
.
$$\phi(z) = \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}z^2}$$
$$\Phi(a) = \int_{-\infty}^a\frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2}z^2}dz$$
Figure 7 shows the pdfs and cdfs of \(X\sim N(1, 1.5^2)\)
and \(Z\sim N(0,1)\)
. We can see that shifting the distribution of \(X\)
by its \(\mu\)
parameter value 1 would align its location with the distribution of \(Z\)
. Dividing by the \(\sigma\)
parameter aligns the width of the distribution with \(Z\)
.
Therefore, we can use
$$P(X\le z)=P\left(\frac{X-\mu}{\sigma} \le \frac{x-\mu}{\sigma}\right)$$
for any normal random variable \(X\sim N(\mu_X,\sigma_X^2)\)
. The probability can then be computed using a z-table or any statistical software such as R.
Let \(X\)
be a random variable and \(p\)
a number between 0 and 1. The p_th_ quantile or 100 \(\cdot\)
p_th_ percentile of the distribution of \(X\)
is the smallest number \(q_p\)
such that
$$F(q_p)=P(X\le q_p)=p.$$
The median of a distribution is its 50_th_ percentile.
For continuous random variables, if \(F\)
is strictly increasing then it is invertible and we have
$$q_p=F^{-1}(p)$$
where \(F^{-1}\)
is the inverse of \(F\)
.
Note that a cdf is not always invertible. A cdf for any discrete random variable is always not invertible. While there exists no inverse \(F\)
for such random variables, the quantiles are still defined as it takes the smallest value when there are multiple values that \(F\)
maps a single value to \(p\)
.