Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprende Forward Process Definition | Mathematical Foundations of Diffusion Models
Quizzes & Challenges
Quizzes
Challenges
/
Diffusion Models and Generative Foundations

bookForward Process Definition

To understand the forward diffusion process in diffusion models, you need to formalize how noise is gradually added to data in a controlled, stepwise manner. The process is defined as a Markov chain, where at each time step t, a small amount of Gaussian noise is added to the previous state. This stepwise corruption is governed by a conditional probability distribution, commonly denoted as q(xtxt1)q(xₜ | xₜ₋₁).

Mathematically, the forward process is defined as:

q(xtxt1)=N ⁣(xt;1βtxt1,  βtI)q(x_t \mid x_{t-1}) = \mathcal{N}\!\left( x_t;\, \sqrt{1 - \beta_t}\, x_{t-1},\; \beta_t I \right)

where:

  • xtxₜ is the noisy sample at time step t;
  • xt1xₜ₋₁ is the sample from the previous step;
  • βtβₜ is the variance schedule (a small positive scalar controlling the noise at each step);
  • II is the identity matrix.

This definition means that, given xt1xₜ₋₁, the next state xtxₜ is sampled from a normal distribution centered at (1βt)xt1\sqrt{(1 - βₜ)} xₜ₋₁ with variance βtβₜ. This Markov property ensures that each step only depends on the immediate previous state and not on the entire history.

The properties of q(xtxt1)q(xₜ | xₜ₋₁) are:

  • It is a Gaussian distribution for every tt;
  • The process is Markovian: the future state depends only on the present state;
  • The variance schedule βt{βₜ} determines the rate of noise addition;
  • As tt increases, the sample becomes progressively noisier, eventually approaching pure Gaussian noise as tt approaches the maximum diffusion step.

You can also derive the marginal distribution of the forward process, which expresses the distribution of xtxₜ given the original, clean data sample x0x₀ after tt steps of noise addition. This is useful because it allows you to sample noisy data at any step directly from x0x₀ without simulating the entire Markov chain step-by-step.

The marginal distribution is given by:

q(xtx0)=N ⁣(xt;αˉtx0,  (1αˉt)I)q(x_t \mid x_0) = \mathcal{N}\!\left( x_t;\, \sqrt{\bar{\alpha}_t}\, x_0,\; (1 - \bar{\alpha}_t)\, I \right)

where:

  • αt=1βtαₜ = 1 - βₜ;
  • αˉt=s=1tαsᾱₜ = ∏_{s=1}^t α_s is the cumulative product of the noise schedule up to time tt.

This form shows that, after tt steps, the noisy sample xtxₜ is still Gaussian, with its mean scaled by αˉt\sqrt{ᾱₜ} and its variance increased to (1αˉt)(1 - ᾱₜ). This cumulative effect of the noise schedule makes it possible to sample xtxₜ in a single step from x0x₀:

xt=αˉtx0  +  1αˉtε,εN(0,I)x_t = \sqrt{\bar{\alpha}_t}\, x_0 \;+\; \sqrt{1 - \bar{\alpha}_t}\, \varepsilon, \qquad \varepsilon \sim \mathcal{N}(0, I)

This property is central to efficient training and sampling in diffusion models.

question mark

What does it mean for the forward diffusion process to be Markovian?

Select the correct answer

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 2. Capítulo 1

Pregunte a AI

expand

Pregunte a AI

ChatGPT

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

Awesome!

Completion rate improved to 8.33

bookForward Process Definition

Desliza para mostrar el menú

To understand the forward diffusion process in diffusion models, you need to formalize how noise is gradually added to data in a controlled, stepwise manner. The process is defined as a Markov chain, where at each time step t, a small amount of Gaussian noise is added to the previous state. This stepwise corruption is governed by a conditional probability distribution, commonly denoted as q(xtxt1)q(xₜ | xₜ₋₁).

Mathematically, the forward process is defined as:

q(xtxt1)=N ⁣(xt;1βtxt1,  βtI)q(x_t \mid x_{t-1}) = \mathcal{N}\!\left( x_t;\, \sqrt{1 - \beta_t}\, x_{t-1},\; \beta_t I \right)

where:

  • xtxₜ is the noisy sample at time step t;
  • xt1xₜ₋₁ is the sample from the previous step;
  • βtβₜ is the variance schedule (a small positive scalar controlling the noise at each step);
  • II is the identity matrix.

This definition means that, given xt1xₜ₋₁, the next state xtxₜ is sampled from a normal distribution centered at (1βt)xt1\sqrt{(1 - βₜ)} xₜ₋₁ with variance βtβₜ. This Markov property ensures that each step only depends on the immediate previous state and not on the entire history.

The properties of q(xtxt1)q(xₜ | xₜ₋₁) are:

  • It is a Gaussian distribution for every tt;
  • The process is Markovian: the future state depends only on the present state;
  • The variance schedule βt{βₜ} determines the rate of noise addition;
  • As tt increases, the sample becomes progressively noisier, eventually approaching pure Gaussian noise as tt approaches the maximum diffusion step.

You can also derive the marginal distribution of the forward process, which expresses the distribution of xtxₜ given the original, clean data sample x0x₀ after tt steps of noise addition. This is useful because it allows you to sample noisy data at any step directly from x0x₀ without simulating the entire Markov chain step-by-step.

The marginal distribution is given by:

q(xtx0)=N ⁣(xt;αˉtx0,  (1αˉt)I)q(x_t \mid x_0) = \mathcal{N}\!\left( x_t;\, \sqrt{\bar{\alpha}_t}\, x_0,\; (1 - \bar{\alpha}_t)\, I \right)

where:

  • αt=1βtαₜ = 1 - βₜ;
  • αˉt=s=1tαsᾱₜ = ∏_{s=1}^t α_s is the cumulative product of the noise schedule up to time tt.

This form shows that, after tt steps, the noisy sample xtxₜ is still Gaussian, with its mean scaled by αˉt\sqrt{ᾱₜ} and its variance increased to (1αˉt)(1 - ᾱₜ). This cumulative effect of the noise schedule makes it possible to sample xtxₜ in a single step from x0x₀:

xt=αˉtx0  +  1αˉtε,εN(0,I)x_t = \sqrt{\bar{\alpha}_t}\, x_0 \;+\; \sqrt{1 - \bar{\alpha}_t}\, \varepsilon, \qquad \varepsilon \sim \mathcal{N}(0, I)

This property is central to efficient training and sampling in diffusion models.

question mark

What does it mean for the forward diffusion process to be Markovian?

Select the correct answer

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 2. Capítulo 1
some-alt