Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn KL Divergence And The VAE Loss | Variational Autoencoders
Autoencoders and Representation Learning

bookKL Divergence And The VAE Loss

When training a variational autoencoder (VAE), your goal is not just to reconstruct the input data accurately, but also to ensure that the learned latent representations are structured and meaningful. The VAE achieves this through a specialized loss function that combines two key components: the reconstruction loss and the Kullback-Leibler (KL) divergence.

The mathematical form of the VAE loss for a single data point xx can be written as:

LVAE=Eq(z∣x)[log⁑p(x∣z)]βˆ’DKL(q(z∣x)βˆ₯p(z))\mathcal{L}_{VAE} = \mathbb{E}_{q(z|x)}[\log p(x|z)] - D_{KL}(q(z|x) \| p(z))

Here, q(z∣x)q(z|x) is the encoder's approximation of the true posterior over latent variables, and p(z)p(z) is the prior, typically a standard normal distribution. The first term, the expected log-likelihood, is the reconstruction loss: it measures how well the decoder can reconstruct the input from the sampled latent code. The second term is the KL divergence between the approximate posterior and the prior.

Note
Definition

KL divergence (Kullback-Leibler divergence) is a measure of how one probability distribution diverges from a second, reference probability distribution. In the context of VAEs, it quantifies how much the learned latent distribution q(z∣x)q(z|x) differs from the prior p(z)p(z). A lower KL divergence means the two distributions are more similar.

Including the KL divergence in the VAE loss is crucial because it encourages the encoder's output distribution (q(z∣x)q(z|x)) to stay close to the chosen prior (p(z)p(z)). If you use a standard normal prior, the KL term penalizes complex or irregular latent distributions, nudging them to be more like a normal distribution centered at zero with unit variance. This regularization ensures that the latent space is smooth and continuous, making it possible to sample new points and generate realistic data from the decoder.

Reconstruction vs. Regularization
expand arrow
  • Increasing the weight of the reconstruction loss leads to more accurate reconstructions, but the latent space may become irregular or overfit to the training data;
  • Increasing the weight of the KL divergence enforces the latent codes to follow the prior distribution more strictly, but may cause the reconstructions to become blurry or less accurate;
  • The VAE loss balances these two objectives, trading off reconstruction quality for a well-behaved and generative latent space.

1. What is the purpose of the KL divergence term in the VAE loss?

2. How does the KL term affect the structure of the latent space?

3. Fill in the blank

question mark

What is the purpose of the KL divergence term in the VAE loss?

Select the correct answer

question mark

How does the KL term affect the structure of the latent space?

Select the correct answer

question-icon

Fill in the blank

The VAE loss combines reconstruction loss and .

Click or drag`n`drop items and fill in the blanks

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 2

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain more about how the KL divergence is calculated in practice?

Why is it important for the latent space to be smooth and continuous?

Can you give an example of how the VAE loss function is implemented in code?

bookKL Divergence And The VAE Loss

Swipe to show menu

When training a variational autoencoder (VAE), your goal is not just to reconstruct the input data accurately, but also to ensure that the learned latent representations are structured and meaningful. The VAE achieves this through a specialized loss function that combines two key components: the reconstruction loss and the Kullback-Leibler (KL) divergence.

The mathematical form of the VAE loss for a single data point xx can be written as:

LVAE=Eq(z∣x)[log⁑p(x∣z)]βˆ’DKL(q(z∣x)βˆ₯p(z))\mathcal{L}_{VAE} = \mathbb{E}_{q(z|x)}[\log p(x|z)] - D_{KL}(q(z|x) \| p(z))

Here, q(z∣x)q(z|x) is the encoder's approximation of the true posterior over latent variables, and p(z)p(z) is the prior, typically a standard normal distribution. The first term, the expected log-likelihood, is the reconstruction loss: it measures how well the decoder can reconstruct the input from the sampled latent code. The second term is the KL divergence between the approximate posterior and the prior.

Note
Definition

KL divergence (Kullback-Leibler divergence) is a measure of how one probability distribution diverges from a second, reference probability distribution. In the context of VAEs, it quantifies how much the learned latent distribution q(z∣x)q(z|x) differs from the prior p(z)p(z). A lower KL divergence means the two distributions are more similar.

Including the KL divergence in the VAE loss is crucial because it encourages the encoder's output distribution (q(z∣x)q(z|x)) to stay close to the chosen prior (p(z)p(z)). If you use a standard normal prior, the KL term penalizes complex or irregular latent distributions, nudging them to be more like a normal distribution centered at zero with unit variance. This regularization ensures that the latent space is smooth and continuous, making it possible to sample new points and generate realistic data from the decoder.

Reconstruction vs. Regularization
expand arrow
  • Increasing the weight of the reconstruction loss leads to more accurate reconstructions, but the latent space may become irregular or overfit to the training data;
  • Increasing the weight of the KL divergence enforces the latent codes to follow the prior distribution more strictly, but may cause the reconstructions to become blurry or less accurate;
  • The VAE loss balances these two objectives, trading off reconstruction quality for a well-behaved and generative latent space.

1. What is the purpose of the KL divergence term in the VAE loss?

2. How does the KL term affect the structure of the latent space?

3. Fill in the blank

question mark

What is the purpose of the KL divergence term in the VAE loss?

Select the correct answer

question mark

How does the KL term affect the structure of the latent space?

Select the correct answer

question-icon

Fill in the blank

The VAE loss combines reconstruction loss and .

Click or drag`n`drop items and fill in the blanks

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 2
some-alt