Learn KL Divergence And The VAE Loss | Variational Autoencoders

Swipe to show menu

When training a variational autoencoder (VAE), your goal is not just to reconstruct the input data accurately, but also to ensure that the learned latent representations are structured and meaningful. The VAE achieves this through a specialized loss function that combines two key components: the reconstruction loss and the Kullback-Leibler (KL) divergence.

The mathematical form of the VAE loss for a single data point $x$ can be written as:

\mathcal{L}_{VAE} = \mathbb{E}_{q(z|x)}[\log p(x|z)] - D_{KL}(q(z|x) \| p(z))

Here, $q(z|x)$ is the encoder's approximation of the true posterior over latent variables, and $p(z)$ is the prior, typically a standard normal distribution. The first term, the expected log-likelihood, is the reconstruction loss: it measures how well the decoder can reconstruct the input from the sampled latent code. The second term is the KL divergence between the approximate posterior and the prior.

Definition

KL divergence (Kullback-Leibler divergence) is a measure of how one probability distribution diverges from a second, reference probability distribution. In the context of VAEs, it quantifies how much the learned latent distribution $q(z|x)$ differs from the prior $p(z)$ . A lower KL divergence means the two distributions are more similar.

Including the KL divergence in the VAE loss is crucial because it encourages the encoder's output distribution ( $q(z|x)$ ) to stay close to the chosen prior ( $p(z)$ ). If you use a standard normal prior, the KL term penalizes complex or irregular latent distributions, nudging them to be more like a normal distribution centered at zero with unit variance. This regularization ensures that the latent space is smooth and continuous, making it possible to sample new points and generate realistic data from the decoder.

Reconstruction vs. Regularization

Increasing the weight of the reconstruction loss leads to more accurate reconstructions, but the latent space may become irregular or overfit to the training data;
Increasing the weight of the KL divergence enforces the latent codes to follow the prior distribution more strictly, but may cause the reconstructions to become blurry or less accurate;
The VAE loss balances these two objectives, trading off reconstruction quality for a well-behaved and generative latent space.

1. What is the purpose of the KL divergence term in the VAE loss?

2. How does the KL term affect the structure of the latent space?

3. Fill in the blank

Everything was clear?

Thanks for your feedback!

Section 4. Chapter 2

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 4. Chapter 2