Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Forward Process Definition | Mathematical Foundations of Diffusion Models
Diffusion Models and Generative Foundations

bookForward Process Definition

To understand the forward diffusion process in diffusion models, you need to formalize how noise is gradually added to data in a controlled, stepwise manner. The process is defined as a Markov chain, where at each time step t, a small amount of Gaussian noise is added to the previous state. This stepwise corruption is governed by a conditional probability distribution, commonly denoted as q(xt∣xtβˆ’1)q(xβ‚œ | xβ‚œβ‚‹β‚).

Mathematically, the forward process is defined as:

q(xt∣xtβˆ’1)=N ⁣(xt; 1βˆ’Ξ²t xtβˆ’1,β€…β€ŠΞ²tI)q(x_t \mid x_{t-1}) = \mathcal{N}\!\left( x_t;\, \sqrt{1 - \beta_t}\, x_{t-1},\; \beta_t I \right)

where:

  • xtxβ‚œ is the noisy sample at time step t;
  • xtβˆ’1xβ‚œβ‚‹β‚ is the sample from the previous step;
  • Ξ²tΞ²β‚œ is the variance schedule (a small positive scalar controlling the noise at each step);
  • II is the identity matrix.

This definition means that, given xtβˆ’1xβ‚œβ‚‹β‚, the next state xtxβ‚œ is sampled from a normal distribution centered at (1βˆ’Ξ²t)xtβˆ’1\sqrt{(1 - Ξ²β‚œ)} xβ‚œβ‚‹β‚ with variance Ξ²tΞ²β‚œ. This Markov property ensures that each step only depends on the immediate previous state and not on the entire history.

The properties of q(xt∣xtβˆ’1)q(xβ‚œ | xβ‚œβ‚‹β‚) are:

  • It is a Gaussian distribution for every tt;
  • The process is Markovian: the future state depends only on the present state;
  • The variance schedule Ξ²t{Ξ²β‚œ} determines the rate of noise addition;
  • As tt increases, the sample becomes progressively noisier, eventually approaching pure Gaussian noise as tt approaches the maximum diffusion step.

You can also derive the marginal distribution of the forward process, which expresses the distribution of xtxβ‚œ given the original, clean data sample x0xβ‚€ after tt steps of noise addition. This is useful because it allows you to sample noisy data at any step directly from x0xβ‚€ without simulating the entire Markov chain step-by-step.

The marginal distribution is given by:

q(xt∣x0)=N ⁣(xt; αˉt x0,β€…β€Š(1βˆ’Ξ±Λ‰t) I)q(x_t \mid x_0) = \mathcal{N}\!\left( x_t;\, \sqrt{\bar{\alpha}_t}\, x_0,\; (1 - \bar{\alpha}_t)\, I \right)

where:

  • Ξ±t=1βˆ’Ξ²tΞ±β‚œ = 1 - Ξ²β‚œ;
  • Ξ±Λ‰t=∏s=1tΞ±sΞ±Μ„β‚œ = ∏_{s=1}^t Ξ±_s is the cumulative product of the noise schedule up to time tt.

This form shows that, after tt steps, the noisy sample xtxβ‚œ is still Gaussian, with its mean scaled by Ξ±Λ‰t\sqrt{Ξ±Μ„β‚œ} and its variance increased to (1βˆ’Ξ±Λ‰t)(1 - Ξ±Μ„β‚œ). This cumulative effect of the noise schedule makes it possible to sample xtxβ‚œ in a single step from x0xβ‚€:

xt=Ξ±Λ‰t x0β€…β€Š+β€…β€Š1βˆ’Ξ±Λ‰t Ρ,Ρ∼N(0,I)x_t = \sqrt{\bar{\alpha}_t}\, x_0 \;+\; \sqrt{1 - \bar{\alpha}_t}\, \varepsilon, \qquad \varepsilon \sim \mathcal{N}(0, I)

This property is central to efficient training and sampling in diffusion models.

question mark

What does it mean for the forward diffusion process to be Markovian?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 1

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Awesome!

Completion rate improved to 8.33

bookForward Process Definition

Swipe to show menu

To understand the forward diffusion process in diffusion models, you need to formalize how noise is gradually added to data in a controlled, stepwise manner. The process is defined as a Markov chain, where at each time step t, a small amount of Gaussian noise is added to the previous state. This stepwise corruption is governed by a conditional probability distribution, commonly denoted as q(xt∣xtβˆ’1)q(xβ‚œ | xβ‚œβ‚‹β‚).

Mathematically, the forward process is defined as:

q(xt∣xtβˆ’1)=N ⁣(xt; 1βˆ’Ξ²t xtβˆ’1,β€…β€ŠΞ²tI)q(x_t \mid x_{t-1}) = \mathcal{N}\!\left( x_t;\, \sqrt{1 - \beta_t}\, x_{t-1},\; \beta_t I \right)

where:

  • xtxβ‚œ is the noisy sample at time step t;
  • xtβˆ’1xβ‚œβ‚‹β‚ is the sample from the previous step;
  • Ξ²tΞ²β‚œ is the variance schedule (a small positive scalar controlling the noise at each step);
  • II is the identity matrix.

This definition means that, given xtβˆ’1xβ‚œβ‚‹β‚, the next state xtxβ‚œ is sampled from a normal distribution centered at (1βˆ’Ξ²t)xtβˆ’1\sqrt{(1 - Ξ²β‚œ)} xβ‚œβ‚‹β‚ with variance Ξ²tΞ²β‚œ. This Markov property ensures that each step only depends on the immediate previous state and not on the entire history.

The properties of q(xt∣xtβˆ’1)q(xβ‚œ | xβ‚œβ‚‹β‚) are:

  • It is a Gaussian distribution for every tt;
  • The process is Markovian: the future state depends only on the present state;
  • The variance schedule Ξ²t{Ξ²β‚œ} determines the rate of noise addition;
  • As tt increases, the sample becomes progressively noisier, eventually approaching pure Gaussian noise as tt approaches the maximum diffusion step.

You can also derive the marginal distribution of the forward process, which expresses the distribution of xtxβ‚œ given the original, clean data sample x0xβ‚€ after tt steps of noise addition. This is useful because it allows you to sample noisy data at any step directly from x0xβ‚€ without simulating the entire Markov chain step-by-step.

The marginal distribution is given by:

q(xt∣x0)=N ⁣(xt; αˉt x0,β€…β€Š(1βˆ’Ξ±Λ‰t) I)q(x_t \mid x_0) = \mathcal{N}\!\left( x_t;\, \sqrt{\bar{\alpha}_t}\, x_0,\; (1 - \bar{\alpha}_t)\, I \right)

where:

  • Ξ±t=1βˆ’Ξ²tΞ±β‚œ = 1 - Ξ²β‚œ;
  • Ξ±Λ‰t=∏s=1tΞ±sΞ±Μ„β‚œ = ∏_{s=1}^t Ξ±_s is the cumulative product of the noise schedule up to time tt.

This form shows that, after tt steps, the noisy sample xtxβ‚œ is still Gaussian, with its mean scaled by Ξ±Λ‰t\sqrt{Ξ±Μ„β‚œ} and its variance increased to (1βˆ’Ξ±Λ‰t)(1 - Ξ±Μ„β‚œ). This cumulative effect of the noise schedule makes it possible to sample xtxβ‚œ in a single step from x0xβ‚€:

xt=Ξ±Λ‰t x0β€…β€Š+β€…β€Š1βˆ’Ξ±Λ‰t Ρ,Ρ∼N(0,I)x_t = \sqrt{\bar{\alpha}_t}\, x_0 \;+\; \sqrt{1 - \bar{\alpha}_t}\, \varepsilon, \qquad \varepsilon \sim \mathcal{N}(0, I)

This property is central to efficient training and sampling in diffusion models.

question mark

What does it mean for the forward diffusion process to be Markovian?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 1
some-alt