4
$\begingroup$

What is a sub-Gaussian distribution? What are its theoretical properties and practical applications?

Relevant lecture notes: MIT OCW 18.S997: High Dimensional Statistics

$\endgroup$
1
  • $\begingroup$ I met this question at Zhihu, a Chinese Q&A site like Quora. $\endgroup$ Commented 2 days ago

1 Answer 1

11
$\begingroup$

Sub-Gaussian Random Variables and Chernoff Bounds

The motivation for introducing the sub-Gaussian distribution stems from the strong tail decay of the Gaussian distribution:

If $X \sim \mathcal{N}(\mu, \sigma^2)$, then:

$$\mathbb{P}(X - \mu > t) \leq \dfrac{1}{\sqrt{2 \pi}} \dfrac{\mathrm{e}^{-\frac{t^2}{2 \sigma^2}}}{t},$$

and its moment-generating function is:

$$\mathbb{E}[\exp(s X)] = \exp\left(s \mu + \dfrac{\sigma^2 s^2}{2}\right).$$

Definition

A random variable $X \in \mathbb{R}$ is said to follow a sub-Gaussian distribution, denoted $X \sim \operatorname{subG}(\sigma^2)$, if it satisfies the following properties:

  • $\mathbb{E}[X] = 0$
  • $\mathbb{E}[\exp(s X)] \leq \exp\left(\dfrac{\sigma^2 s^2}{2}\right), \quad \forall s \in \mathbb{R},$

where $\sigma^2$ is the variance parameter.

Theorem

Using Markov's Lemma, we can derive the following theorem:

If $X \sim \operatorname{subG}(\sigma^2)$, then for any $t > 0$:

$$\mathbb{P}[X > t] \leq \exp\left(-\dfrac{t^2}{2 \sigma^2}\right), \quad \mathbb{P}[X < -t] \leq \exp\left(-\dfrac{t^2}{2 \sigma^2}\right). \tag{1}$$

Proof

We use a standard Chernoff bound technique:

$$\mathbb{P}(X > t) \leq \mathbb{P}\left(e^{s X} > e^{s t}\right) \leq \dfrac{\mathbb{E}\left[e^{s X}\right]}{e^{s t}} \leq e^{\dfrac{\sigma^2 s^2}{2} - s t}. \tag{2}$$

By choosing an appropriate $s$, the result follows.

Properties

  1. Moment Inequality: If $\mathbb{P}[|X| > t] \leq 2 \exp\left(-\dfrac{t^2}{2 \sigma^2}\right)$, then for any $k \geq 1$:

    $$\mathbb{E}\left[|X|^k\right] \leq \left(2 \sigma^2\right)^{k/2} k \Gamma(k/2). \tag{3}$$

    In particular (after scaling), $\mathbb{E}[|X|] \leq \sigma \sqrt{2 \pi}$. The proof uses:

    $$\mathbb{E}\left[|X|^k\right] = \int_0^\infty \mathbb{P}\left(|X|^k > t\right) \mathrm{d} t.$$

  2. Converse Application: If equation (1) holds, then for $\forall s > 0$:

    $$\mathbb{E}[\exp(s X)] \leq e^{4 \sigma^2 s^2}.$$

  3. Multivariate Property: Let $X = (X_1, \ldots, X_n)$, where each $X_i$ follows a sub-Gaussian distribution. Then $X$ also follows a sub-Gaussian distribution.

Connection with Hoeffding's Inequality

Theorem

Let $X$ be a random variable with $\mathbb{E}(X) = 0$ and $X \in [a, b]$ almost surely. Then for $\forall s \in \mathbb{R}$:

$$\mathbb{E}\left[e^{s X}\right] \leq e^{\dfrac{s^2 (b - a)^2}{8}}. \tag{4}$$

In particular, $X \sim \operatorname{subG}\left(\dfrac{(b - a)^2}{4}\right)$.

Proof

Define $\psi(s) = \log \mathbb{E}\left[e^{s X}\right]$. Then:

$$\psi'(s) = \dfrac{\mathbb{E}\left[X e^{s X}\right]}{\mathbb{E}\left[e^{s X}\right]}, \quad \psi''(s) = \dfrac{\mathbb{E}\left[X^2 e^{s X}\right]}{\mathbb{E}\left[e^{s X}\right]} - \left[\dfrac{\mathbb{E}\left[X e^{s X}\right]}{\mathbb{E}\left[e^{s X}\right]}\right]^2. \tag{5}$$

Thus, $\psi''(s)$ is the variance of $X$ under the probability measure $d\mathbb{Q} = \dfrac{e^{s X}}{\mathbb{E}\left[e^{s X}\right]} d\mathbb{P}$. Since $X \in [a, b]$ almost surely:

$$\operatorname{var}(X) = \operatorname{var}\left(X - \dfrac{a + b}{2}\right) \leq \mathbb{E}\left[\left(X - \dfrac{a + b}{2}\right)^2\right] \leq \dfrac{(b - a)^2}{4}. \tag{6}$$

Finally, using $\psi(0) = \log 1 = 0$, $\psi'(0) = \mathbb{E} X = 0$, we have:

$$\psi(s) = \int_0^s \int_0^\mu \psi''(\rho) d\rho d\mu \leq \dfrac{s^2 (b - a)^2}{8}. \tag{7}$$

This completes the proof.


Sub-Exponential Random Variables

Some distributions, such as the Laplace distribution, have slower tail decay. For $X \sim \operatorname{Lap}(1)$:

$$\mathbb{P}(|X| > t) = e^{-t}, \quad t \geq 0. \tag{8}$$

This decays more slowly than a Gaussian, indicating a heavier tail.

Theorem

If $X$ is a random variable with $\mathbb{E}[X] = 0$ and $\mathbb{P}(|X| > t) \leq 2 e^{-2 t / \lambda}$, then the moment inequality holds:

$$\mathbb{E}\left[|X|^k\right] \leq \lambda^k k!. \tag{9}$$

Moreover, its moment-generating function satisfies:

$$\mathbb{E}\left[e^{s X}\right] \leq e^{2 s^2 \lambda^2}, \quad \forall |s| \leq \dfrac{1}{2 \lambda}. \tag{10}$$

The proof is similar to that for sub-Gaussian properties and is omitted here for brevity, focusing instead on introducing the sub-exponential distribution.

Definition

A random variable $X \sim \operatorname{subE}(\lambda)$ if it satisfies:

  • $\mathbb{E}[X] = 0$
  • $\mathbb{E}\left[e^{s X}\right] \leq e^{s^2 \lambda^2 / 2}, \quad \forall |s| \leq \dfrac{1}{\lambda}$.

Relationship Between Distributions

Theorem

If $X \sim \operatorname{subG}(\sigma^2)$, then the random variable $Z = X^2 - \mathbb{E}[X^2] \sim \operatorname{subE}(16 \sigma^2)$.

Proof

$$ \begin{aligned} \mathbb{E}\left[e^{s Z}\right] &= 1 + \sum_{k=2}^\infty \dfrac{s^k \mathbb{E}\left[X^2 - \mathbb{E}\left[X^2\right]\right]^k}{k!} \\ &\leq 1 + \sum_{k=2}^\infty \dfrac{s^k 2^{k-1} \left(\mathbb{E}\left[X^{2k}\right] + \left(\mathbb{E}\left[X^2\right]\right)^k\right)}{k!} \quad (\text{Jensen}) \\ &\leq 1 + \sum_{k=2}^\infty \dfrac{s^k 4^k \mathbb{E}\left[X^{2k}\right]}{2(k!)} \quad (\text{Jensen again}) \\ &\leq 1 + \sum_{k=2}^\infty \dfrac{s^k 4^k 2 \left(2 \sigma^2\right)^k k!}{2(k!)} \\ &= 1 + \left(8 s \sigma^2\right)^2 \sum_{k=0}^\infty \left(8 s \sigma^2\right)^k \\ &= 1 + 128 s^2 \sigma^4 \quad \text{for } |s| \leq \dfrac{1}{16 \sigma^2} \\ &\leq e^{128 s^2 \sigma^4}. \end{aligned} \tag{10} $$

Bernstein's Inequality

Theorem

Let $X_1, \ldots, X_n$ be random variables satisfying $\mathbb{E}(X_i) = 0$ and $X_i \sim \operatorname{subE}(\lambda)$. Define $\bar{X} = \dfrac{1}{n} \sum_{i=1}^n X_i$. For any $t > 0$:

$$\mathbb{P}(\bar{X} > t) \vee \mathbb{P}(\bar{X} < -t) \leq \exp\left[-\dfrac{n}{2}\left(\dfrac{t^2}{\lambda^2} \wedge \dfrac{t}{\lambda}\right)\right]. \tag{11}$$

Note: Here, $a \vee b = \max\{a, b\}$, $a \wedge b = \min\{a, b\}$.

Proof

Assume $\lambda = 1$. Using the Chernoff bound technique, for $\forall s > 0$:

$$\mathbb{P}(\bar{X} > t) \leq \prod_{i=1}^n \mathbb{E}\left[e^{s X_i}\right] e^{-s n t}. \tag{12}$$

When $s \leq 1$, $\mathbb{E}\left[e^{s X_i}\right] \leq e^{s^2 / 2}$, so:

$$\mathbb{P}(\bar{X} > t) \leq e^{\dfrac{n s^2}{2} - s n t}.$$

Choosing $s = 1 \wedge t$ yields:

$$\mathbb{P}(\bar{X} > t) \leq e^{-\dfrac{n}{2}\left(t^2 \wedge t\right)}. \tag{13}$$

Note: Bernstein’s inequality often refers to a tighter bound with several variants, which may be organized further in the future.


Maximal Inequalities

Theorem

Let $X_1, \ldots, X_N$ be $N$ random variables with $X_i \sim \operatorname{subG}(\sigma^2)$. Then:

$$\mathbb{E}\left[\max_{1 \leq i \leq N} X_i\right] \leq \sigma \sqrt{2 \log N}, \quad \text{and} \quad \mathbb{E}\left[\max_{1 \leq i \leq N} |X_i|\right] \leq \sigma \sqrt{2 \log (2N)}. \tag{14}$$

Furthermore, for $\forall t > 0$:

$$\mathbb{P}\left(\max_{1 \leq i \leq N} X_i > t\right) \leq N e^{-\dfrac{t^2}{2 \sigma^2}}, \quad \text{and} \quad \mathbb{P}\left(\max_{1 \leq i \leq N} |X_i| > t\right) \leq 2N e^{-\dfrac{t^2}{2 \sigma^2}}. \tag{15}$$

Proof

Using the Chernoff Bound technique:

$$ \begin{aligned} \mathbb{E}\left[\max_{1 \leq i \leq N} X_i\right] &= \dfrac{1}{s} \mathbb{E}\left[\log e^{s \max_{1 \leq i \leq N} X_i}\right] \\ &\leq \dfrac{1}{s} \log \mathbb{E}\left[e^{s \max_{1 \leq i \leq N} X_i}\right] \quad (\text{Jensen}) \\ &= \dfrac{1}{s} \log \mathbb{E}\left[\max_{1 \leq i \leq N} e^{s X_i}\right] \\ &\leq \dfrac{1}{s} \log \sum_{1 \leq i \leq N} \mathbb{E}\left[e^{s X_i}\right] \\ &\leq \dfrac{1}{s} \log \sum_{1 \leq i \leq N} e^{\dfrac{\sigma^2 s^2}{2}} \\ &= \dfrac{\log N}{s} + \dfrac{\sigma^2 s}{2}. \end{aligned} \tag{16} $$

Taking $s = \sqrt{2 \log N / \sigma^2}$ proves the left inequality of (14).

For the left inequality of (15), consider subadditivity:

$$ \begin{aligned} \mathbb{P}\left(\max_{1 \leq i \leq N} X_i > t\right) &= \mathbb{P}\left(\bigcup_{1 \leq i \leq N} \{X_i > t\}\right) \\ &\leq \sum_{1 \leq i \leq N} \mathbb{P}(X_i > t) \\ &\leq N e^{-\dfrac{t^2}{2 \sigma^2}}. \end{aligned} \tag{17} $$

For the absolute value case, note that:

$$\max_{1 \leq i \leq N} |X_i| = \max_{1 \leq i \leq 2N} X_i,$$

where $X_{N+i} = -X_i$.

$\endgroup$
3
  • 11
    $\begingroup$ Which book have you taken this answer from? Is it your own book? $\endgroup$ Commented 2 days ago
  • 2
    $\begingroup$ Why do you need $\mathbb{E}[X] = 0$? Are there no "sub-Gaussian" distributions with non-zero means? Wikipedia simply requires there to be a positive constant $C$ such that for all $t \geq 0$ you have $\mathbb{P}(|X| \geq t) \leq 2 \exp{(-t^2/C^2)}$, and this covers more distributions with light tails. $\endgroup$ Commented yesterday
  • $\begingroup$ Before stating the multivariate property, it would help to first define what it means for random vectors to be sub-Gaussian. The definition as stated earlier only applies to real-valued random variables. $\endgroup$ Commented 21 hours ago

Your Answer

By clicking β€œPost Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.