Idea: **sample mean converges to population mean**, as sample size goes to infinity.

Version 1: For a sequence of uncorrelated random variables, if they have the same expectation while the supremum of variances is not growing too fast, then their average converges in probability to the expectation. Symbolically, $\forall X_i, X_j \in \{ X_n \}, i \neq j, \mathbb{E} X_i X_j = 0, \mathbb{E} X_i = \mathbb{E} X_j = \mu$, and $\sup_{i<n} \mathrm{Var} X_i = o(n)$, then $$\frac{1}{n} \sum_i X_i \overset{p}{\to} \mu$$

Version 2: For a random sample from a population with finite population mean, the sample mean converges in probability to the population mean. Symbolically, ${ X_i } \text{ i.i.d. } X$, $\mathbb{E}X < \infty$, then $$\frac{1}{n} \sum_i X_i \overset{p}{\to} \mathbb{E}X$$

For a random sample from a population with finite population mean, the sample mean converges almost surely to the population mean. Symbolically, ${ X_i } \text{ i.i.d. } X$, $\mathbb{E}X < \infty$, then $$\frac{1}{n} \sum_i X_i \overset{a.s.}{\to} \mathbb{E}X$$

Idea 1: **the distribution of sample mean is asymptotically Gaussian**, with variance equal to population variance.

Idea 2: In many cases, **the sum of** (centered) **independent random variables is distributed approximately Gaussian**.
The necessary and sufficient conditions are:

- Each summand should be negligible compared to the dispersion of the sum, unless itself has a distribution close to Gaussian.
- ... (refer to review paper of CLT)

Because of CLT, Gaussian random variables are often used to approximate a finite sum of random variables. But with current computing capacity, the importance of approximations like CLT is somewhat lessened.

${ X_i }$ is a random sample from a population with finite population mean $\mathbb{E}X$ and variance $\mathrm{Var}X$, then $$\sqrt{n} \left( \frac{1}{n} \sum_i X_i - \mathbb{E}X \right) \Rightarrow N(0, \mathrm{Var}X )$$

${ X_i }$ is a sequence of independent random variables in $L^2 (\Omega, \Sigma, P)$, if the *Lindeberg condition* holds:
$$\lim_{n \to \infty} \frac{1}{s_n^2}\sum_{i = 1}^{n} \mathbb{E}\big[(X_i - \mu_i)^2 \cdot \mathbf{1}_{{ | X_i - \mu_i | > \varepsilon s_n }} \big] = 0$$
, where $s_n^2 = \sum_i \sigma_i^2$. Then
$$\sqrt{n} \left( \frac{1}{n} \sum_i ( X_i - \mathbb{E}X_i ) \right) \Rightarrow N(0, s_n^2 )$$

${ X_i }$ is a random sample from a population with finite $L^3$-norm, then $$\sup_{Z\in\mathbb{R}} \lvert F_{Z_n}(z) - F_{Z}(z) \rvert \leq \frac{c}{\sqrt{n}} \frac{ \lVert X-\mu \rVert_3 }{\sigma^3}$$ , where $Z_n = \sqrt{n} \frac{\bar{X}_n-\mu}{\sigma}$ and $Z \sim N(0,1)$. $c$ is a constant and $c \in [\tfrac{1}{\sqrt{2\pi}}, 0.8)$

The Berry-Essen CLT provides the accuracy of approximation.

A distribution is **stable** if linear combinations of two independent random samples from the population has the same distribution, up to location and scale parameters:

A non-degenerate distribution $X$ is a

stable distributionif $$X_1, X_2 \sim X, \forall a, b > 0, \exists c > 0, d \in \mathbb{R}: a X_1 + b X_2 \sim c X + d$$ The distribution isstrictly stableif $d=0$.

Stable distributions have characteristic function:

$$\begin{aligned} \varphi(t; \alpha, \beta, c, \mu) &= \exp \{ i t \mu - |c t|^\alpha (1 - i \beta \text{sgn}(t) \Phi(\alpha, t) ) \} \\ \Phi(\alpha, t) &= \begin{cases} \tan(\pi \alpha / 2) & (\alpha \neq 1) \\ -2/\pi \log|t| & (\alpha = 1) \end{cases} \end{aligned}$$

Stable distributions form a four-parameter family, with **stability parameter** $\alpha$, **skewness parameter** $\beta$ (not the standardized 3rd moment), scale parameter $c$, and location parameter $\mu$.

The stability parameter takes value in $(0, 2]$, and roughly corresponds to concentration:

- $\alpha = 2$: normal distribution;
- $0 < \alpha < 2$: variance undefined;
- $0 < \alpha \le 1$: expectation undefined;

When the skewness parameter takes value in $[-1, 1])$, and roughly corresponds to symmetry:

- $\beta = 0$: the distribution is symmetric about $\mu$;
- $\beta = 1$ and $\alpha < 1$: the distribution has support $[\mu, +\infty)$;

Special cases:

- $\alpha = 1, \beta = 0$: Cauchy distribution;
- $\alpha = 0.5, \beta = 1$: Levy distribution;

**Generalized Central Limit Theorem**: {Gnedenko and Kolmogorov, 1954}

The sum of a number of random variables with probability density decreasing as power law $|x|^{−\alpha − 1}$ where $0 < \alpha < 2$ (and therefore having infinite variance) will tend to a stable distribution $f(x; \alpha, 0, c, 0)$ as the number of summands grows; if $\alpha > 2$, the classical central limit theorem applies.

Idea: **Continuous mapping**, on support of the stochastic limit, **preserves convergence of random variables and distributions**.

Thm: $h(\cdot)$ is a continuous function on the support of X. If $X_i \overset{p}{\to} X$, then $h(X_i) \overset{p}{\to} h(X)$

The same theorem also holds for almost sure convergence, convergence in $L^2$, and convergence of distribution functions.

Slutsky's Theorem is an important corollary of CMT:

If $X_i \Rightarrow X$, and $Y_i \overset{p}{\to} a$, then

- $Y_i X_i \Rightarrow aX$
- $Y_i + X_i \Rightarrow a + X$

Idea: **Continuously differentiable function preserves asymptotic distribution**.

Thm: Given $\mathbf{X}_n \overset{p}{\to} \mathbf{b}$ and $a_n ( \mathbf{X}_n - \mathbf{b} ) \Rightarrow \mathbf{X}$. If $g: \mathbb{R}^d \to \mathbb{R}^r$ is continuously differentiable at $\mathbf{b}$, then $$a_n [ g(\mathbf{X}_n) - g(\mathbf{b}) ] \Rightarrow (g\nabla)(\mathbf{b}) \mathbf{X}$$