Learn Forward Process Definition | Mathematical Foundations of Diffusion Models

Swipe to show menu

To understand the forward diffusion process in diffusion models, you need to formalize how noise is gradually added to data in a controlled, stepwise manner. The process is defined as a Markov chain, where at each time step t, a small amount of Gaussian noise is added to the previous state. This stepwise corruption is governed by a conditional probability distribution, commonly denoted as $q(xₜ | xₜ₋₁)$ .

Mathematically, the forward process is defined as:

q(x_t \mid x_{t-1}) = \mathcal{N}\!\left( x_t;\, \sqrt{1 - \beta_t}\, x_{t-1},\; \beta_t I \right)

where:

$xₜ$ is the noisy sample at time step t;
$xₜ₋₁$ is the sample from the previous step;
$βₜ$ is the variance schedule (a small positive scalar controlling the noise at each step);
$I$ is the identity matrix.

This definition means that, given $xₜ₋₁$ , the next state $xₜ$ is sampled from a normal distribution centered at $\sqrt{(1 - βₜ)} xₜ₋₁$ with variance $βₜ$ . This Markov property ensures that each step only depends on the immediate previous state and not on the entire history.

The properties of $q(xₜ | xₜ₋₁)$ are:

It is a Gaussian distribution for every $t$ ;
The process is Markovian: the future state depends only on the present state;
The variance schedule ${βₜ}$ determines the rate of noise addition;
As $t$ increases, the sample becomes progressively noisier, eventually approaching pure Gaussian noise as $t$ approaches the maximum diffusion step.

You can also derive the marginal distribution of the forward process, which expresses the distribution of $xₜ$ given the original, clean data sample $x₀$ after $t$ steps of noise addition. This is useful because it allows you to sample noisy data at any step directly from $x₀$ without simulating the entire Markov chain step-by-step.

The marginal distribution is given by:

q(x_t \mid x_0) = \mathcal{N}\!\left( x_t;\, \sqrt{\bar{\alpha}_t}\, x_0,\; (1 - \bar{\alpha}_t)\, I \right)

where:

$αₜ = 1 - βₜ$ ;
$ᾱₜ = ∏_{s=1}^t α_s$ is the cumulative product of the noise schedule up to time $t$ .

This form shows that, after $t$ steps, the noisy sample $xₜ$ is still Gaussian, with its mean scaled by $\sqrt{ᾱₜ}$ and its variance increased to $(1 - ᾱₜ)$ . This cumulative effect of the noise schedule makes it possible to sample $xₜ$ in a single step from $x₀$ :

x_t = \sqrt{\bar{\alpha}_t}\, x_0 \;+\; \sqrt{1 - \bar{\alpha}_t}\, \varepsilon, \qquad \varepsilon \sim \mathcal{N}(0, I)

This property is central to efficient training and sampling in diffusion models.

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 1

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 2. Chapter 1