STA237: Probability, Statistics, and Data Analysis I
Michael Jongho Moon
PhD Student, DoSS, University of Toronto
June 15, 2022
From Government of Canada Website at https://www.canada.ca/en/health-canada/services/canadian-student-tobacco-alcohol-drugs-survey/2018-2019-summary.html
A total sample of 65,850 students in grades 7 to 12 (secondary I through V in Quebec) completed the survey … The weighted results represents over 2 million Canadian students…
In 2018-19, 3% of students in grade 7 to 12 were current cigarette smokers …
For simplicity, assume
Note that assuming equal probabilities implies that the event of each student being selected is an independent event.
Suppose we estimate the total prevalence - or proportion - of student smoking in Canada, \(\theta\), using the proportion among the selected sample:
\[\frac{\text{# students who currently smoke in the sample}}{\text{total sample size}}\]
The population of interest is all Canadian students between grade 7 and 12 at the time of the survey.
The sample is those students who participated in the survey.
The parameter of interest is the prevalence of smoking among the population.
The estimator is the function that computes the proportion of smokers in the sample.
A population is the entire group of interest - can be people, things, events, etc.
A sample is a subgroup of a population used for estimation. In particular, a (simple) random sample consists of samples that are independent and identically distributed.
A parameter is a quantity of interest of a population.
An estimator is a function of a sample that provides an estimate of a parameter.
Suppose we estimate the total prevalence - or proportion - of student smoking in Canada using the proportion among the selected sample:
\[\frac{\text{# students who currently smoke in the sample}}{\text{total sample size}}\]
Population
Sample
Parameter
\(\theta=\) 0.0608
Estimate
\(T_{100}=\) 0.06
Note that the estimator is a random variable since the sampling process is random.
The distribution of the random variable is an example of a sampling distribution.
Let \(T=h\left(X_1,X_2,\ldots,X_n\right)\) be an estimator based on a random sample \(X_1\), \(X_2\), \(X_3\), , \(X_n\). The probability distribution of \(T\) is called the sampling distribution of \(T\).
Indicator function
\[X_i=\begin{cases}1 & \text{when }i\text{th student is a smoker}\\0 & \text{otherwise.}\end{cases}\]
Estimator
\[T_n=\overline{X}_n=\frac{\sum_{i=1}^nX_i}{n}\]
Estimator
\[\frac{\text{# smokers in the sample}}{\text{total sample size}}\]
Recall …
\[E\left(T_n\right)=E\left(\overline{X}_n\right)=E\left(X_1\right)\]
where \(\overline{X}_n=\left.\sum_{i=1}^nX_i\right/n\).
\(E\left(T_n\right)-E\left(X_1\right)\) is an example of a bias. It’s the difference between the estimator and the parameter of interest.
When it’s 0, we say the estimator is an unbiased estimator.
Recall …
\[\text{Var}\left(T_n\right)=\text{Var}\left(\overline{X}_n\right)=\frac{\text{Var}\left(X_1\right)}{n}\]
where \(\overline{X}_n=\left.\sum_{i=1}^nX_i\right/n\).
Let \(Y_1\), \(Y_2\), \(Y_3\), … be an infinite sequence of random variables, and let \(W\) be another random variable. We say the sequence \(\left\{Y_n\right\}\) converges in probability to \(W\) if
\[\lim_{n\to\infty}P\left(\left|Y_n-W\right|\ge\varepsilon\right)=0\]
for all \(\varepsilon >0\), and we write
\[Y_n\overset{p}{\to}W.\]
Suppose \(Z_n\sim\text{Exp}\left(n\right)\) and \(y=0\). Let \(\varepsilon\) be any positive value (\(\varepsilon>0\)).
\[\implies Z_n\overset{p}{\to}y\]
Any random variable \(Y\) with \(E\left(Y\right)<\infty\) and any \(a>0\) satisfy
\[P\left(\left|Y-E\left(Y\right)\right|\ge a\right)\le \frac{\text{Var}\left(Y\right)}{a^2}.\]
Calculate \(P\left(\left|Y-\mu\right|<k\sigma\right)\) for \(k=1,2,3\) when \[Y\sim\text{Exp}(1),\]
\[\mu=E\left(Y\right),\] and \[\sigma^2=\text{Var}\left(Y\right).\]
Compare the computed values with the Chebyshev’s inequality bounds.
\[P\left(\left|Y-\mu\right|<k\sigma\right)=1-P\left(\left|Y-\mu\right|\ge k\sigma\right)\ge1-\frac{\sigma^2}{k^2\sigma^2}=1-\frac{1}{k^2}\]
Apply Chebyshev’s inequality to \(\overline{X}_n=\left.\sum_{i=1}^nX_i\right/n\) where \(X_1\), \(X_2\), …, \(X_n\) are random samples from a population. Let \(\mu\) and \(\sigma^2\) be the population mean and variance.
For any \(\varepsilon>0\),
\[P\left(\left|\overline{X}_n-\mu\right|>\varepsilon\right)\le \frac{\sigma^2}{n\varepsilon^2}\]
What happens as \(n\to\infty\)?
Suppose \(X_1\), \(X_2\), …, \(X_n\) are independent random variables with expectation \(\mu\) and variance \(\sigma^2\). Then for any \(\varepsilon > 0\),
\[\lim_{n\to\infty}P\left(\left|\overline{X}_n-\mu\right|>\varepsilon\right)=0,\]
where \(\overline{X}_n=\left.\sum_{i=1}^n X_i\right/n\).
FYI, there is the Strong law of large number, which states \[P\left(\lim_{n\to\infty}\overline{X}_n=\mu\right)=1.\]
We will focus on the WLLN for the course
Roughly speaking, the law states that a sample mean converges to the population mean as we increase the sample size.
For example, simulating
\[X_i\sim N(0,1)\]
for \(i=1,2,3,\dots,1000\) and computing \(\overline{X}_n\) for \(n=1,2,3,\dots,1000\)…
The law does not hold when the population mean doesn’t exist or is not finite.
Cauchy is an example of a distribution with out an expectation.
Simulating \(X_i\) from a Cauchy distribution for \(i=1,2,3,\dots,1000\) and computing \(\overline{X}_n\) for \(n=1,2,3,\dots,1000\)…
We have already seen the law in action!
Suppose we are interested in
\[\theta=P\left(X\in \mathcal{K}\right),\]
where \(X\) is some random variable and \(\mathcal{K}\) is a subinterval of \(\mathbb{R}\).
Assume that while you don’t know the distribution of \(X\), you can obtain \(n\) random samples of \(X\) - \(X_1\), \(X_2\), …, \(X_n\).
This is equivalent to using \(T_n\) as the estimator for parameter \(\theta\) where …
\[T_n=\frac{\sum_{i=1}^n \mathcal{I}_{X_i\in\mathcal{K}}}{n}\] and \(\mathcal{I}_{X_i\in\mathcal{K}}=1\) when \(X_i\in\mathcal{K}\) and \(0\) otherwise.
With a large \(n\) (e.g., \(1\ 000\), \(100\ 000\), …), we are using the property that
\[T_n\overset{p}{\to}\theta\]
since \(\mathcal{I}_{X_i\in\mathcal{K}}\sim \text{Ber}\left(\theta\right)\).
© 2022. Michael J. Moon. University of Toronto.
Sharing, posting, selling, or using this material outside of your personal use in this course is NOT permitted under any circumstances.