The central limit theorem is a cornerstone of statistics, stating that the sum of many random variables will be approximately normally distributed, and thereby explaining the ubiquity of the normal distribution. An example is the normal approximation of the Binomial distribution.

## Definition

*Consider a sequence $x_1, \dots, x_n$ of independent, identically distributed random variables (Bernoulli trials) with mean $\mu$ and variance $\sigma^2$. Their empirical mean is defined by*

$$

\bar{x} = \frac{1}{n} (x_1 + \dots + x_n) = \mathrm{empirical \, mean}

$$

*Normalizing it to a random variable with expectation value zero in the way*

$$

z = \frac{\bar{x} – \mu}{\sigma/\sqrt{n}}

$$

*the probability distribution $p(z) \to N(0,1)$ for large $n$ (“convergence in distribution”). Equivalently, we may write $\mathrm{CDR}(z) \to \mathrm{Erf}(z)$ for allmost all $z \in \mathbb{R}$.*

## Intuition

The marvel of the central limit theorem is the normal distribution emerging from adding up many random variables that themselves don’t have to be normal distributions.

However, they need to be independent (see the previous article on random variables) and they need to be ‘identically distributed’, i.e. drawn from the same probability distribution, with the same, well-defined mean $\mu$ and variance $\sigma^2$.

Let’s first try to understand why the limiting distribution of $z$ has to have mean zero and variance one:

First, since the expectation value of a random variable is linear,

$$

\left\langle \frac{x_1 + \dots + x_n}{n} \right\rangle

= \frac{\left\langle(x_1 + \dots + x_n) \right\rangle}{n}

= n \frac{\left\langle x_1 \right\rangle}{n} = \left\langle x_1 \right\rangle = \mu

$$

the expectation value of the empirical mean equals the mean of the individual distributions and thus the expectation value of $z$ equals zero. For statistical samples, one says that the empirical mean tends toward the population mean.

Second, for independent random variables, the variance adds up according to the Pythagorean theorem

$$

\text{Var}(aX + bY) = a^2 \text{Var}(X) + b^2 \text{Var}(Y)

$$

and thus

$$

\text{Var}\left(\frac{x_1 + \dots + x_n}{n}\right) = \frac{\text{Var}((x_1 + \dots + x_n))}{n^2} = \frac{n \text{Var}(x)}{n^2} = \frac{\text{Var}(x)}{n}

$$

Therefore, dividing by $\frac{\sigma}{\sqrt{n}}$ ensures that $z$ has variance one.

The real mystery that needs explanation then is: Why is it a normal distribution at all?

Whereas formal proofs are rather technical, we shortly sketch two ideas that will be further elaborated in follow-on articles:

**Elementary Explanation**

Intuitively, the central limit theorem states that the sum $\frac{x_1 + \dots + x_n}{\sqrt{n}}$ of random variables converges to a normal distribution for $n \to \infty$, independent of the individual $x_i$.

It does not matter if we mix or replace some of the random variables, as long as they have the same first and second moments. Especially, if we replace the $x_i$ by a corresponding normal distribution, the stability of the Gaussian guarantees that adding two Gaussians will again be a Gaussian.

Thus, it can be shown, that for the central limit theorem to be true it requires the individual $x_i$ to be dependent only on the first and second moment (see here).

**The Maximum Entropy Principle**

Following the principle of maximum entropy, the distribution governing $z$ should be the one with the maximum possible entropy consistent with the constraints that the mean being $\mu$ and the variance being $\sigma^2$.

This can be shown employing calculus of variations.

## Relation to the Law of Large Numbers

The central limit theorem is closely related to the law of large numbers, stating that the empirical mean $\frac{x_1 + \dots + x_n}{n}$ of identically distributed, independent random variables depends only on $\mu$ for $n \to \infty$.

Similarly, the central limit theorem states that the expression $\frac{x_1 + \dots + x_n}{\sqrt{n}}$ depends only on $\mu$ and $\sigma^2$ for $n \to \infty$.

## Stable Distributions

The normal distribution is just one example of a stable distribution: adding two Gaussians yields again a Gaussian.

If either $\mu$ or $\sigma^2$ (or both) are infinite, the central limit theorem breaks down, but there may still be a limiting distribution that is not a normal distribution: these are generally called stable distributions.

The specifics of when a stable distribution exists and what form it has were elaborated in the 1930s by P. Lévy, A. Khintchine, and others (see for example the book by Paul and Baschnagel).

## Ressources

Wolfgang Paul and Jörg Baschnagel. Stochastic Processes. 2nd Edition 2013 Springer

math.stackexchange.com/questions/12983/intuition-about-the-central-limit-theorem

intuitive visualization: www.cantorsparadise.com/the-central-limit-theorem-why-is-it-so-2ae93edf6e8