Aprende Markov Chains and Forward Diffusion | Diffusion Processes

Desliza para mostrar el menú

To understand how diffusion models operate, you first need to grasp the concept of a Markov chain and its special properties in the context of noise-based generative modeling. A Markov chain is a mathematical system that undergoes transitions from one state to another within a finite or countable number of possible states. The defining feature of a Markov chain is the Markov property: the probability of transitioning to the next state depends only on the current state, not on the sequence of states that preceded it. This "memoryless" property is crucial in modeling processes where only the present matters for predicting the immediate future.

In the context of diffusion models, you can think of the forward diffusion process as a Markov chain where, at each step, a small amount of noise is added to the data. Over a sequence of many steps, the data gradually becomes more corrupted until it resembles random noise. At each step, the system's state is the current noisy sample, and the transition is defined by adding noise according to a fixed schedule. Because each step depends only on the current noisy sample and not on how that sample was produced, the process satisfies the Markov property.

There are several key properties of Markov chains that are relevant here:

The future state depends only on the present state;
The process is defined by a transition probability or rule;
The chain can be represented as a sequence of random variables, each conditioned only on its immediate predecessor;
Markov chains are widely used to model stochastic processes where history beyond the current state is irrelevant.

This framework allows the forward diffusion process to be both mathematically tractable and easy to simulate, which is essential for training generative models based on diffusion.

The mathematical formulation of the forward diffusion process leverages the Markov property to describe how the original data is gradually corrupted by noise. Let $x_0$ represent the original data sample. The forward process generates a sequence of increasingly noisy samples $x₁, x₂, ..., x_T$ , where $T$ is the total number of diffusion steps. At each step $t$ , the next sample $xₜ$ is produced by adding noise to the previous sample $xₜ₋₁$ according to a predefined noise schedule.

Formally, the forward process is defined as a Markov chain with the following transition:

q(xₜ | xₜ₋₁) = Normal(xₜ; sqrt(1 - βₜ) * xₜ₋₁, βₜ * I)

Here, $βₜ$ is a small positive number (the variance of the noise added at step $t$ ), and $I$ is the identity matrix. This means that, at each step, you generate $xₜ$ by scaling the previous sample and adding Gaussian noise with variance $βₜ$ . The entire process can be described as:

q(x₀) - \text{data distribution} \\ q(x₁, ..., x_T | x₀) = \prod_{t=1}^T q(xₜ | xₜ₋₁)

Because of the Markov property, the joint probability of the sequence factors into a product of conditional probabilities, each depending only on the immediately preceding sample. This recursive structure is what makes the forward diffusion process both simple to implement and analytically convenient.

You can simulate a forward diffusion Markov chain by following these step-by-step instructions:

Start with your original data sample, called $x0$ .
Decide on the total number of diffusion steps, $T$ .
Choose a noise schedule, which is a sequence of small positive numbers $[\beta_1, \beta_2, ..., \beta_T]$ that determine how much noise to add at each step.
Set your current sample $x$ to be the original data sample $x0$ .
For each step $t$ $t$ from 1 to $T$ $T$ :
- Draw a random noise vector $\epsilon$ from a standard normal distribution (mean 0, variance 1).
- Update the sample by scaling the current $x$ by $\sqrt{1 - \beta_t}$ and adding $\sqrt{\beta_t} * \epsilon$ .
- The result becomes the new current sample $x$ for the next step.
After completing all $T$ steps, your final sample $x$ is the fully diffused (noisy) version of the original data.

This process ensures that, at each step, the sample is updated using only its current value and newly generated noise, following the Markov property.

¿Todo estuvo claro?

¡Gracias por tus comentarios!

Sección 1. Capítulo 2

Pregunte a AI

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

Sección 1. Capítulo 2