Lecture 11: Central Limit Theorem

STA237: Probability, Statistics, and Data Analysis I

Michael Jongho Moon

PhD Student, DoSS, University of Toronto

Monday, June 19, 2023

Recall: Law of large numbers

Suppose \(X_1\), \(X_2\), …, \(X_n\) are independent random variables with expectation \(\mu\) and variance \(\sigma^2\). Then for any \(\varepsilon > 0\),


where \(\overline{X}_n=\left.\sum_{i=1}^n X_i\right/n\).

That is, \(\overline{X}_n\) converges in probability to \(\mu\).

Distributions of sample means

  • Convergence in probability to the mean suggest that the sampling distribution of \(\overline{X}_n\) becomes narrower as \(n\) increases.
  • The convergence occurs regardless of the originating distribution.

Distributions of sample means

  • We also observe that their distributions become roughly symmetric bell shapes with larger sample sizes.
  • This behaviour also seems to occur regardless of the underlying distribution shape.

Convergence in distribution

The distribution of \(Y_n\) becomes closer and closer to that of \(W\).

Let \(Y_1\), \(Y_2\), \(Y_3\), … be an infinite sequence of random variables, and let \(W\) be another random variable. Then, we say the sequence \(\left\{Y_n\right\}\) converges in distribution to \(W\) if for all \(w\in\mathbb{R}\) such that \(P\left(W = w\right)=0\), we have

\[\lim_{n\to\infty}P\left(Y_n\le w\right)=P\left(W\le w\right)\]

and we write


Example: Binomial for infinite trials

Suppose \(X_n\sim\text{Binom}\left(n,\theta_n\right)\) describes the number of success of \(n\) independent sub-intervals of an equal length where \(\theta_n=\frac{\lambda}{n}\) for some \(\lambda>0\) that represents a rate of success.

What happens when you make the sub-intervals infinitesimally small?

We have seen that \(\lim_{n\to\infty}p_{X_n}(x)=p_X(x)\) where \(X\sim\text{Pois}(\lambda)\). Recall how we derived the pmf of a Poisson random variabel in Lecture 3.

  • \(\lim_{n\to\infty}p_{X_n}(x)=\lim_{n\to\infty}\binom{n}{x}\left(\frac{\lambda}{n}\right)^x\left(1-\frac{\lambda}{n}\right)^{n-x}\)
  • \(\phantom{lim_{n\to\infty}p_X(x)}=\frac{\lambda^x}{x!}\lim_{n\to\infty}\frac{n!}{\left(n-x\right)!n^x}\left(1-\frac{\lambda}{n}\right)^{n-x}\)
  • \(\phantom{lim_{n\to\infty}p_X(x)}\vdots\)
  • \(\phantom{lim_{n\to\infty}p_X(x)}=\frac{\lambda^xe^{-\lambda}}{x!}\)

\[X_n\overset{d}{\to}X, \quad X\sim\text{Pois}\left(\lambda\right)\]

Central limit theorem

  • We have observed that sample menas \(\overline{X}_n\) converge to distributions with similar shapes regardless of the originating distribution.
  • The central limit theorem explains to which distribution they converge.

The central limit theorem

Let \(X_1\), \(X_2\), \(X_3\), … be independent and identically distributed random variables with \(E\left(X_1\right)=\mu<\infty\) and \(0< \text{Var}\left(X_1\right)=\sigma^2<\infty\). For \(n\ge1\), let


where \(\overline{X}_n=\left.\sum_{i=1}^nX_i\right/n\). Then, for any number \(a\in\mathbb{R}\),

\[\lim_{n\to\infty}P\left(Z_n\le a\right)=\Phi\left(a\right),\]

where \(\Phi\) is the cumulative distribution function of the standard normal distribution.

The central limit theorem

In practice, \(\overline{X}_n\) approximately follows the distribution of \(\left(Z\frac{\sigma}{\sqrt{n}}+\mu\right)\) or \(N\left(\mu, \frac{\sigma^2}{n}\right)\) for large \(n\).

In other words,


where \(Z\sim N\left(0,1\right)\).

Example: CSTAD

Recall the survey on Canadian student smoking prevalence.

  • As you increase the sample size \(n\), the sampling distribution of \(T_n\) not only became narrower by the LLN . . .

Example: CSTAD

Recall the survey on Canadian student smoking prevalence.

  • As you increase the sample size \(n\), the sampling distribution of \(T_n\) not only became narrower by the LLN

  • but also closer to a symmetrical and bell-shaped distribution by the CLT.

Example: Normal approximation of the binomial distribution

Suppose \(Y\sim \text{Binom}\left(50, 0.3\right)\) and we are interested in \(P(Y\le 20)\).

  • \(Y\sim\sum_{i=1}^{50} W_i\) where \(W_i\sim\text{Ber}(0.3)\) independently.
  • We may use the CLT to approximate \[\phantom{=}P\left(Y\le 20\right)\] \[=P\left(\frac{Y}{50} \le \frac{20}{50}\right)\] \[=P\left(\overline{W}_{50}\le 0.4\right)\]
  • Recall \(E(W_1)=0.3\) and \(\text{Var}(W_1)=0.3\cdot 0.7=0.21\)

Using exact \(F_Y(y)\)

  • \(F_Y(20) = \sum_{y=0}^{20} p_Y(y)\)
  • \(\phantom{F_Y(20)} = \sum_{y=0}^{20} \binom{50}{y}0.3^{y}0.7^{50 - y}\)

pbinom(20, 50, 0.3) in R.

  • \(\phantom{F_Y(20)} \approx 0.952\)

Approximating via \(Z\sim N(0,1)\)

  • \(F_Y(20) \approx P\left(Z\cdot \sqrt{0.21 / 50} + 0.3 \le 0.4\right)\)

pnorm(.4, .3, sqrt(.21 / 50)) in R

  • \(\phantom{F_Y(y)} \approx 0.939\)

\[Z_{50} = 50 \cdot \left(Z\sqrt{\frac{0.21}{50}}+0.3\right)\]

\[\overline{W}_{5}\quad\text{vs}\quad Z_5\]

  • pbinom(2, 5, .3) \(\approx 0.837\)
  • pnorm(2, 1.5, sqrt(.21 * 5)) \(\approx 0.687\)

\[\overline{W}_{50}\quad\text{vs}\quad Z_{50}\]

  • pbinom(20, 50, .3) \(\approx 0.952\)
  • pnorm(20, 15, sqrt(.21 * 50)) \(\approx 0.939\)

\[\overline{W}_{5000}\quad\text{vs}\quad Z_{5000}\]

  • pbinom(200, 500, .3) \(\approx 0.9999992\)
  • pnorm(200, 150, sqrt(.21 * 500)) \(\approx 0.9999995\)

\[\overline{W}_{5}\quad\text{vs}\quad Z_5\]

  • With smaller number of trials, \(n\), the “gaps” are larger and the approximation isn’t precise.
  • With larger number of trials, \(n\), the approximation becomes more precise.


Given an independent and identically distributed sample from a population of finite mean \(\mu\) and positive finite variance \(\sigma^2\),

  • the sample mean converges in distribution to a normal distribution with mean \(\mu\) and \(\sigma/n\)

We often use the central limit theorem to approximate distributions of finite samples when the sample size is sufficiently large.


  • Weekly Activity 5 Questions
  • Selected questions from past exams
  • Questions

Blackjack competition

  • If your group’s player is selected, please explain your group’s strategy.
  • Make your guess on Quercus