What is a sub-Gaussian distribution?

Question

What is a sub-Gaussian distribution? What are its theoretical properties and practical applications?

Relevant lecture notes: MIT OCW 18.S997: High Dimensional Statistics

I met this question at Zhihu, a Chinese Q&A site like Quora. — 138 Aspen
– 138 Aspen, Commented 2 days ago

138 Aspen · Accepted Answer · 2025-09-01 13:11:33Z

Sub-Gaussian Random Variables and Chernoff Bounds

The motivation for introducing the sub-Gaussian distribution stems from the strong tail decay of the Gaussian distribution:

If $X \sim \mathcal{N}(\mu, \sigma^2)$, then:

$$\mathbb{P}(X - \mu > t) \leq \dfrac{1}{\sqrt{2 \pi}} \dfrac{\mathrm{e}^{-\frac{t^2}{2 \sigma^2}}}{t},$$

and its moment-generating function is:

$$\mathbb{E}[\exp(s X)] = \exp\left(s \mu + \dfrac{\sigma^2 s^2}{2}\right).$$

Definition

A random variable $X \in \mathbb{R}$ is said to follow a sub-Gaussian distribution, denoted $X \sim \operatorname{subG}(\sigma^2)$, if it satisfies the following properties:

$\mathbb{E}[X] = 0$
$\mathbb{E}[\exp(s X)] \leq \exp\left(\dfrac{\sigma^2 s^2}{2}\right), \quad \forall s \in \mathbb{R},$

where $\sigma^2$ is the variance parameter.

Theorem

Using Markov's Lemma, we can derive the following theorem:

If $X \sim \operatorname{subG}(\sigma^2)$, then for any $t > 0$:

$$\mathbb{P}[X > t] \leq \exp\left(-\dfrac{t^2}{2 \sigma^2}\right), \quad \mathbb{P}[X < -t] \leq \exp\left(-\dfrac{t^2}{2 \sigma^2}\right). \tag{1}$$

Proof

We use a standard Chernoff bound technique:

$$\mathbb{P}(X > t) \leq \mathbb{P}\left(e^{s X} > e^{s t}\right) \leq \dfrac{\mathbb{E}\left[e^{s X}\right]}{e^{s t}} \leq e^{\dfrac{\sigma^2 s^2}{2} - s t}. \tag{2}$$

By choosing an appropriate $s$, the result follows.

Properties

Moment Inequality: If $\mathbb{P}[|X| > t] \leq 2 \exp\left(-\dfrac{t^2}{2 \sigma^2}\right)$, then for any $k \geq 1$:

$$\mathbb{E}\left[|X|^k\right] \leq \left(2 \sigma^2\right)^{k/2} k \Gamma(k/2). \tag{3}$$

In particular (after scaling), $\mathbb{E}[|X|] \leq \sigma \sqrt{2 \pi}$. The proof uses:

$$\mathbb{E}\left[|X|^k\right] = \int_0^\infty \mathbb{P}\left(|X|^k > t\right) \mathrm{d} t.$$
Converse Application: If equation (1) holds, then for $\forall s > 0$:

$$\mathbb{E}[\exp(s X)] \leq e^{4 \sigma^2 s^2}.$$
Multivariate Property: Let $X = (X_1, \ldots, X_n)$, where each $X_i$ follows a sub-Gaussian distribution. Then $X$ also follows a sub-Gaussian distribution.

Connection with Hoeffding's Inequality

Theorem

Let $X$ be a random variable with $\mathbb{E}(X) = 0$ and $X \in [a, b]$ almost surely. Then for $\forall s \in \mathbb{R}$:

$$\mathbb{E}\left[e^{s X}\right] \leq e^{\dfrac{s^2 (b - a)^2}{8}}. \tag{4}$$

In particular, $X \sim \operatorname{subG}\left(\dfrac{(b - a)^2}{4}\right)$.

Proof

Define $\psi(s) = \log \mathbb{E}\left[e^{s X}\right]$. Then:

$$\psi'(s) = \dfrac{\mathbb{E}\left[X e^{s X}\right]}{\mathbb{E}\left[e^{s X}\right]}, \quad \psi''(s) = \dfrac{\mathbb{E}\left[X^2 e^{s X}\right]}{\mathbb{E}\left[e^{s X}\right]} - \left[\dfrac{\mathbb{E}\left[X e^{s X}\right]}{\mathbb{E}\left[e^{s X}\right]}\right]^2. \tag{5}$$

Thus, $\psi''(s)$ is the variance of $X$ under the probability measure $d\mathbb{Q} = \dfrac{e^{s X}}{\mathbb{E}\left[e^{s X}\right]} d\mathbb{P}$. Since $X \in [a, b]$ almost surely:

$$\operatorname{var}(X) = \operatorname{var}\left(X - \dfrac{a + b}{2}\right) \leq \mathbb{E}\left[\left(X - \dfrac{a + b}{2}\right)^2\right] \leq \dfrac{(b - a)^2}{4}. \tag{6}$$

Finally, using $\psi(0) = \log 1 = 0$, $\psi'(0) = \mathbb{E} X = 0$, we have:

$$\psi(s) = \int_0^s \int_0^\mu \psi''(\rho) d\rho d\mu \leq \dfrac{s^2 (b - a)^2}{8}. \tag{7}$$

This completes the proof.

Sub-Exponential Random Variables

Some distributions, such as the Laplace distribution, have slower tail decay. For $X \sim \operatorname{Lap}(1)$:

$$\mathbb{P}(|X| > t) = e^{-t}, \quad t \geq 0. \tag{8}$$

This decays more slowly than a Gaussian, indicating a heavier tail.

Theorem

If $X$ is a random variable with $\mathbb{E}[X] = 0$ and $\mathbb{P}(|X| > t) \leq 2 e^{-2 t / \lambda}$, then the moment inequality holds:

$$\mathbb{E}\left[|X|^k\right] \leq \lambda^k k!. \tag{9}$$

Moreover, its moment-generating function satisfies:

$$\mathbb{E}\left[e^{s X}\right] \leq e^{2 s^2 \lambda^2}, \quad \forall |s| \leq \dfrac{1}{2 \lambda}. \tag{10}$$

The proof is similar to that for sub-Gaussian properties and is omitted here for brevity, focusing instead on introducing the sub-exponential distribution.

Definition

A random variable $X \sim \operatorname{subE}(\lambda)$ if it satisfies:

$\mathbb{E}[X] = 0$
$\mathbb{E}\left[e^{s X}\right] \leq e^{s^2 \lambda^2 / 2}, \quad \forall |s| \leq \dfrac{1}{\lambda}$.

Relationship Between Distributions

Theorem

If $X \sim \operatorname{subG}(\sigma^2)$, then the random variable $Z = X^2 - \mathbb{E}[X^2] \sim \operatorname{subE}(16 \sigma^2)$.

Proof

$$ \begin{aligned} \mathbb{E}\left[e^{s Z}\right] &= 1 + \sum_{k=2}^\infty \dfrac{s^k \mathbb{E}\left[X^2 - \mathbb{E}\left[X^2\right]\right]^k}{k!} \\ &\leq 1 + \sum_{k=2}^\infty \dfrac{s^k 2^{k-1} \left(\mathbb{E}\left[X^{2k}\right] + \left(\mathbb{E}\left[X^2\right]\right)^k\right)}{k!} \quad (\text{Jensen}) \\ &\leq 1 + \sum_{k=2}^\infty \dfrac{s^k 4^k \mathbb{E}\left[X^{2k}\right]}{2(k!)} \quad (\text{Jensen again}) \\ &\leq 1 + \sum_{k=2}^\infty \dfrac{s^k 4^k 2 \left(2 \sigma^2\right)^k k!}{2(k!)} \\ &= 1 + \left(8 s \sigma^2\right)^2 \sum_{k=0}^\infty \left(8 s \sigma^2\right)^k \\ &= 1 + 128 s^2 \sigma^4 \quad \text{for } |s| \leq \dfrac{1}{16 \sigma^2} \\ &\leq e^{128 s^2 \sigma^4}. \end{aligned} \tag{10} $$

Bernstein's Inequality

Theorem

Let $X_1, \ldots, X_n$ be random variables satisfying $\mathbb{E}(X_i) = 0$ and $X_i \sim \operatorname{subE}(\lambda)$. Define $\bar{X} = \dfrac{1}{n} \sum_{i=1}^n X_i$. For any $t > 0$:

$$\mathbb{P}(\bar{X} > t) \vee \mathbb{P}(\bar{X} < -t) \leq \exp\left[-\dfrac{n}{2}\left(\dfrac{t^2}{\lambda^2} \wedge \dfrac{t}{\lambda}\right)\right]. \tag{11}$$

Note: Here, $a \vee b = \max\{a, b\}$, $a \wedge b = \min\{a, b\}$.

Proof

Assume $\lambda = 1$. Using the Chernoff bound technique, for $\forall s > 0$:

$$\mathbb{P}(\bar{X} > t) \leq \prod_{i=1}^n \mathbb{E}\left[e^{s X_i}\right] e^{-s n t}. \tag{12}$$

When $s \leq 1$, $\mathbb{E}\left[e^{s X_i}\right] \leq e^{s^2 / 2}$, so:

$$\mathbb{P}(\bar{X} > t) \leq e^{\dfrac{n s^2}{2} - s n t}.$$

Choosing $s = 1 \wedge t$ yields:

$$\mathbb{P}(\bar{X} > t) \leq e^{-\dfrac{n}{2}\left(t^2 \wedge t\right)}. \tag{13}$$

Note: Bernstein’s inequality often refers to a tighter bound with several variants, which may be organized further in the future.

Maximal Inequalities

Theorem

Let $X_1, \ldots, X_N$ be $N$ random variables with $X_i \sim \operatorname{subG}(\sigma^2)$. Then:

$$\mathbb{E}\left[\max_{1 \leq i \leq N} X_i\right] \leq \sigma \sqrt{2 \log N}, \quad \text{and} \quad \mathbb{E}\left[\max_{1 \leq i \leq N} |X_i|\right] \leq \sigma \sqrt{2 \log (2N)}. \tag{14}$$

Furthermore, for $\forall t > 0$:

$$\mathbb{P}\left(\max_{1 \leq i \leq N} X_i > t\right) \leq N e^{-\dfrac{t^2}{2 \sigma^2}}, \quad \text{and} \quad \mathbb{P}\left(\max_{1 \leq i \leq N} |X_i| > t\right) \leq 2N e^{-\dfrac{t^2}{2 \sigma^2}}. \tag{15}$$

Proof

Using the Chernoff Bound technique:

$$ \begin{aligned} \mathbb{E}\left[\max_{1 \leq i \leq N} X_i\right] &= \dfrac{1}{s} \mathbb{E}\left[\log e^{s \max_{1 \leq i \leq N} X_i}\right] \\ &\leq \dfrac{1}{s} \log \mathbb{E}\left[e^{s \max_{1 \leq i \leq N} X_i}\right] \quad (\text{Jensen}) \\ &= \dfrac{1}{s} \log \mathbb{E}\left[\max_{1 \leq i \leq N} e^{s X_i}\right] \\ &\leq \dfrac{1}{s} \log \sum_{1 \leq i \leq N} \mathbb{E}\left[e^{s X_i}\right] \\ &\leq \dfrac{1}{s} \log \sum_{1 \leq i \leq N} e^{\dfrac{\sigma^2 s^2}{2}} \\ &= \dfrac{\log N}{s} + \dfrac{\sigma^2 s}{2}. \end{aligned} \tag{16} $$

Taking $s = \sqrt{2 \log N / \sigma^2}$ proves the left inequality of (14).

For the left inequality of (15), consider subadditivity:

$$ \begin{aligned} \mathbb{P}\left(\max_{1 \leq i \leq N} X_i > t\right) &= \mathbb{P}\left(\bigcup_{1 \leq i \leq N} \{X_i > t\}\right) \\ &\leq \sum_{1 \leq i \leq N} \mathbb{P}(X_i > t) \\ &\leq N e^{-\dfrac{t^2}{2 \sigma^2}}. \end{aligned} \tag{17} $$

For the absolute value case, note that:

$$\max_{1 \leq i \leq N} |X_i| = \max_{1 \leq i \leq 2N} X_i,$$

where $X_{N+i} = -X_i$.

Which book have you taken this answer from? Is it your own book? — The_Statistician_Man
– The_Statistician_Man, Commented 2 days ago
Why do you need $\mathbb{E}[X] = 0$? Are there no "sub-Gaussian" distributions with non-zero means? Wikipedia simply requires there to be a positive constant $C$ such that for all $t \geq 0$ you have $\mathbb{P}(|X| \geq t) \leq 2 \exp{(-t^2/C^2)}$, and this covers more distributions with light tails. — Henry
– Henry, Commented yesterday
Before stating the multivariate property, it would help to first define what it means for random vectors to be sub-Gaussian. The definition as stated earlier only applies to real-valued random variables. — Silverfish
– Silverfish, Commented 21 hours ago

Stack Exchange Network

What is a sub-Gaussian distribution?

1 Answer 1

Sub-Gaussian Random Variables and Chernoff Bounds

Definition

Theorem

Proof

Properties

Connection with Hoeffding's Inequality

Theorem

Proof

Sub-Exponential Random Variables

Theorem

Definition

Relationship Between Distributions

Theorem

Proof

Bernstein's Inequality

Theorem

Proof

Maximal Inequalities

Theorem

Proof

Your Answer

Hot Network Questions

What is a sub-Gaussian distribution?

1 Answer 1

Sub-Gaussian Random Variables and Chernoff Bounds

Definition

Theorem

Proof

Properties

Connection with Hoeffding's Inequality

Theorem

Proof

Sub-Exponential Random Variables

Theorem

Definition

Relationship Between Distributions

Theorem

Proof

Bernstein's Inequality

Theorem

Proof

Maximal Inequalities

Theorem

Proof

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions