Aprende Reparameterization & Noise Prediction vs x₀ Prediction | Mathematical Foundations of Diffusion Models

Desliza para mostrar el menú

Reparameterization is a central concept in diffusion models, because it defines what the neural network is trained to predict at each step of the reverse process. This choice directly shapes the loss function, the training dynamics, and how interpretable the model is.

What Is Reparameterization?

In a diffusion model, you start from a clean data sample $x_0$ and add noise step by step through a forward process, eventually obtaining a highly noisy sample $x_T$ . The reverse process tries to undo this corruption, going from $x_T$ back to $x_0$ via intermediate steps $x_T, x_{T-1}, \dots, x_1, x_0$ .

During training, you must decide:

"What does the network ( $f_\theta$ ) predict at each step?"

That choice is called reparameterization of the model's output.

Two Main Approaches

Typically, there are two common targets for the network’s prediction:

Noise prediction;
Clean data prediction.

Both lead to slightly different training objectives and behaviors.

1. Noise Prediction

In this parameterization, the network predicts the noise that was added at a given step.

The network outputs $\varepsilon_\theta(x_t, t)$ , an estimate of the Gaussian noise $\varepsilon$ used to obtain $x_t$ from $x_0$ ;
The training objective is to make $\varepsilon_\theta(x_t, t)$ as close as possible to the true $\varepsilon$ ;
The loss is typically a mean squared error (MSE):

\mathcal{L}_\varepsilon(\theta) = \mathbb{E}*{t, x_0, \varepsilon} \left[ \left| \varepsilon*\theta(x_t, t) - \varepsilon \right|_2^2 \right], \quad \varepsilon \sim \mathcal{N}(0, I);

This is the parameterization used in the original DDPM formulation.

2. x₀-Prediction

In this parameterization, the network tries to directly recover the original clean sample from a noisy one.

The network outputs $x_{0,\theta}(x_t, t)$ , an estimate of $x_0$ ;
The training objective is to minimize the difference between the predicted and true clean data;
The typical loss is again an MSE:

\mathcal{L}_{x_0}(\theta) = \mathbb{E}*{t, x_0, \varepsilon} \left[ \left| x*{0,\theta}(x_t, t) - x_0 \right|_2^2 \right];

This can make intermediate predictions more interpretable, because at every step you can "peek" at what the model thinks the clean data looks like.

Differences and Implications

Training Dynamics

Noise prediction:
- The noise $\varepsilon$ is sampled from a simple, fixed distribution (usually $\mathcal{N}(0, I)$ ), independent of the complexity of the data.
- This often leads to more stable training, since the target distribution is simple and well-behaved.
x₀-prediction
- The model must directly approximate the (possibly complex) data distribution at each step.
- This can be harder to optimize but provides direct supervision on reconstruction quality.

Model Performance

Noise prediction has become standard in many diffusion models because it works well empirically and is robust.
$x_0$ -prediction can offer benefits in some settings, especially when combined with hybrid or reweighted objectives, and can sometimes improve sample quality or controllability.

Interpretability

x₀-prediction is more interpretable: at each step, you can view the model’s guess of the clean data.
Noise prediction is less intuitive to inspect directly, but often more convenient from an optimization perspective.

Summary Table (HTML)

¿Todo estuvo claro?

¡Gracias por tus comentarios!

Sección 2. Capítulo 4

Pregunte a AI

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

Sección 2. Capítulo 4