Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda Variational Autoencoders (VAEs) | Theoretical Foundations
Generative AI
course content

Conteúdo do Curso

Generative AI

Generative AI

1. Introduction to Generative AI
2. Theoretical Foundations
3. Building and Training Generative Models
4. Applications of Generative AI
5. Ethical and Societal Implications
6. Future Trends and Challenges

book
Variational Autoencoders (VAEs)

Autoencoders and Variational Autoencoders

Autoencoders are neural networks designed to learn efficient representations of data through encoding and decoding processes. A standard autoencoder consists of two components:

  1. Encoder: compresses input data into a lower-dimensional representation.
  2. Decoder: reconstructs the original data from the compressed representation.

Traditional autoencoders learn deterministic mappings, meaning they compress data into a fixed latent space. However, they struggle with generating diverse outputs, as their latent space lacks structure and smoothness.

Differences Between Standard Autoencoders and VAEs

Variational Autoencoders (VAEs) improve upon standard autoencoders by introducing a probabilistic latent space, allowing for structured and meaningful generation of new data.

Encoder-Decoder Structure and Latent Space Representation

VAEs consist of two main components:

  1. Encoder: Maps the input data to a probability distribution over a lower-dimensional latent space zz.
  2. Decoder: Samples from the latent space and reconstructs the input data.

Mathematical Formulation:

The encoder produces a mean and variance for the latent space:

where:

  • μ\mu represents the mean of the latent space distribution;
  • σ2\sigma^2 represents the variance;
  • fμf_\mu and fσf_\sigma are functions parameterized by θ\theta, typically implemented as neural networks.

Instead of directly passing these parameters to the decoder, we sample from a Gaussian distribution using the reparameterization trick:

where:

  • \odot represents element-wise multiplication;
  • ϵ\epsilon is a random variable drawn from a standard normal distribution.

This trick allows gradients to propagate through the sampling process, making backpropagation possible. Without this trick, the stochastic sampling operation would make gradient-based learning infeasible.

The decoder reconstructs the input from zz by learning a function g(z;ϕ)g(z; \phi), which outputs the parameters of the data distribution. The decoder network is trained to minimize the difference between the reconstructed and original data, ensuring high-quality reconstructions.

Probabilistic Modeling in VAEs

VAEs are built upon Bayesian inference, which allows them to model the relationship between observed data xx and latent variables zz using probability distributions. The fundamental principle is based on Bayes’ theorem:

Since computing p(x)p(x) requires integrating over all possible latent variables, which is intractable, VAEs approximate the posterior p(zx)p(z∣x) with a simpler function q(zx)q(z∣x), enabling efficient inference.

Evidence Lower Bound (ELBO)

Instead of maximizing the intractable marginal likelihood p(x)p(x), VAEs maximize its lower bound, called the Evidence Lower Bound (ELBO):

where:

  • The first term, Eq(zx)[logp(xz)]\mathbb{E}_{q(z|x)}[\log{p(x|z)}], is the reconstruction loss, ensuring that the output resembles the input;
  • The second term, DKL(q(zx)  p(z))D_{KL}(q(z|x)\ ||\ p(z)), is the KL divergence, which regularizes the latent space by ensuring q(zx)q(z∣x) stays close to the prior p(z)p(z).

By balancing these two terms, VAEs achieve a trade-off between accurate reconstructions and smooth latent space representations.

Applications of VAEs

1. Anomaly Detection

VAEs can learn the normal structure of data. When encountering anomalous inputs, the model struggles to reconstruct them, leading to higher reconstruction errors, which can be used for detecting outliers.

2. Image Synthesis

VAEs can generate new images by sampling from the learned latent space. They are widely used in applications like:

  • Face generation (e.g., generating new human faces);
  • Style transfer (e.g., blending artistic styles).

3. Text Generation

VAEs can be adapted for natural language processing (NLP) tasks, where they are used to generate diverse and coherent text sequences.

4. Drug Discovery

VAEs have been applied in bioinformatics and drug discovery, where they generate molecular structures with desired properties.

Conclusion

Variational Autoencoders are a powerful class of generative models that introduce probabilistic modeling to autoencoders. Their ability to generate diverse and realistic data has made them a fundamental component of modern generative AI.

Compared to traditional autoencoders, VAEs provide a structured latent space, improving generative capabilities. As research advances, VAEs continue to play a crucial role in AI applications spanning computer vision, NLP, and beyond.

1. What is the main difference between a standard autoencoder and a variational autoencoder (VAE)?

2. What is the role of the KL divergence term in the VAE loss function?

3. Why is the reparameterization trick necessary in VAEs?

4. Which of the following best describes the ELBO (Evidence Lower Bound) in VAEs?

5. Which of the following is NOT a common application of VAEs?

question mark

What is the main difference between a standard autoencoder and a variational autoencoder (VAE)?

Select the correct answer

question mark

What is the role of the KL divergence term in the VAE loss function?

Select the correct answer

question mark

Why is the reparameterization trick necessary in VAEs?

Select the correct answer

question mark

Which of the following best describes the ELBO (Evidence Lower Bound) in VAEs?

Select the correct answer

question mark

Which of the following is NOT a common application of VAEs?

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 2. Capítulo 6
Sentimos muito que algo saiu errado. O que aconteceu?
some-alt