Variational Lower Bound (ELBO) and Training Objective
You have seen how diffusion models gradually add noise to data through a forward process and attempt to reverse this process via a parameterized model. To train these models effectively, you need a principled objective that aligns the learned reverse process with the true underlying data distribution. This objective arises naturally from variational inference, resulting in the Evidence Lower Bound (ELBO).
To derive the ELBO for diffusion models, start by considering the generative process as a Markov chain that transforms noise into data. The model defines a parameterized reverse process, denoted as:
pθ(x0,x1,...,xT)=p(xT)t=1∏Tpθ(xt−1∣xt)Where xT is pure noise. The true data likelihood, pθ(x0), is intractable because it involves integrating over all possible latent trajectories. Instead, you introduce a variational distribution, the forward process q(x1,...,xT∣x0), which is tractable and known.
The ELBO is constructed as follows:
-
Start from the log-likelihood of the data, logpθ(x0);
-
Rewrite it as an expectation over the forward process trajectory:
logpθ(x0)=log∫q(x1:T∣x0),q(x1:T∣x0)pθ(x0:T),dx1:T -
Apply Jensen's inequality to obtain a lower bound:
logpθ(x0)≥Eq(x1:T∣x0)[logq(x1:T∣x0)pθ(x0:T)]
This expectation is the ELBO. For diffusion models, the ELBO becomes a sum of KL divergences and expected log-likelihood terms at each diffusion step, reflecting the discrepancy between the true forward process and the learned reverse process.
Next, break down each term in the ELBO and interpret its meaning. The ELBO for diffusion models typically takes the form:
-
A sum over time steps of KL divergences between the forward and reverse transition probabilities:
Eq(x0)[t=1∑TKL(q(xt∣xt−1,x0)∥pθ(xt−1∣xt))]; -
A terminal term involving the prior and the last latent variable:
KL(q(xT∣x0)∥p(xT)); -
An optional reconstruction term (if the model is designed for it):
Eq(x1∣x0)[logpθ(x0∣x1)].
Each KL term measures how well the model's reverse transition at each step matches the true forward transition conditioned on the data. The terminal KL ensures that the distribution at the final time step matches the noise prior. The reconstruction term (when present) encourages the model to reconstruct the original data from noisy samples.
Maximizing the data likelihood is the ultimate goal when training generative models, as it ensures that the model assigns high probability to real data. However, the exact likelihood is intractable for diffusion models due to the latent trajectory integration. By maximizing the ELBO, you maximize a lower bound on the log-likelihood. As the model improves and the variational approximation becomes tighter, the gap between the ELBO and the true likelihood narrows. Thus, minimizing the negative ELBO is equivalent to maximizing the likelihood up to the looseness of the bound, providing a principled training objective for diffusion models.
¡Gracias por tus comentarios!
Pregunte a AI
Pregunte a AI
Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla
Can you explain how each KL divergence term is computed in practice?
What is the intuition behind the terminal KL term in the ELBO?
How does the reconstruction term affect the training of diffusion models?
Awesome!
Completion rate improved to 8.33
Variational Lower Bound (ELBO) and Training Objective
Desliza para mostrar el menú
You have seen how diffusion models gradually add noise to data through a forward process and attempt to reverse this process via a parameterized model. To train these models effectively, you need a principled objective that aligns the learned reverse process with the true underlying data distribution. This objective arises naturally from variational inference, resulting in the Evidence Lower Bound (ELBO).
To derive the ELBO for diffusion models, start by considering the generative process as a Markov chain that transforms noise into data. The model defines a parameterized reverse process, denoted as:
pθ(x0,x1,...,xT)=p(xT)t=1∏Tpθ(xt−1∣xt)Where xT is pure noise. The true data likelihood, pθ(x0), is intractable because it involves integrating over all possible latent trajectories. Instead, you introduce a variational distribution, the forward process q(x1,...,xT∣x0), which is tractable and known.
The ELBO is constructed as follows:
-
Start from the log-likelihood of the data, logpθ(x0);
-
Rewrite it as an expectation over the forward process trajectory:
logpθ(x0)=log∫q(x1:T∣x0),q(x1:T∣x0)pθ(x0:T),dx1:T -
Apply Jensen's inequality to obtain a lower bound:
logpθ(x0)≥Eq(x1:T∣x0)[logq(x1:T∣x0)pθ(x0:T)]
This expectation is the ELBO. For diffusion models, the ELBO becomes a sum of KL divergences and expected log-likelihood terms at each diffusion step, reflecting the discrepancy between the true forward process and the learned reverse process.
Next, break down each term in the ELBO and interpret its meaning. The ELBO for diffusion models typically takes the form:
-
A sum over time steps of KL divergences between the forward and reverse transition probabilities:
Eq(x0)[t=1∑TKL(q(xt∣xt−1,x0)∥pθ(xt−1∣xt))]; -
A terminal term involving the prior and the last latent variable:
KL(q(xT∣x0)∥p(xT)); -
An optional reconstruction term (if the model is designed for it):
Eq(x1∣x0)[logpθ(x0∣x1)].
Each KL term measures how well the model's reverse transition at each step matches the true forward transition conditioned on the data. The terminal KL ensures that the distribution at the final time step matches the noise prior. The reconstruction term (when present) encourages the model to reconstruct the original data from noisy samples.
Maximizing the data likelihood is the ultimate goal when training generative models, as it ensures that the model assigns high probability to real data. However, the exact likelihood is intractable for diffusion models due to the latent trajectory integration. By maximizing the ELBO, you maximize a lower bound on the log-likelihood. As the model improves and the variational approximation becomes tighter, the gap between the ELBO and the true likelihood narrows. Thus, minimizing the negative ELBO is equivalent to maximizing the likelihood up to the looseness of the bound, providing a principled training objective for diffusion models.
¡Gracias por tus comentarios!