Diffusion Models and Probabilistic Generative Approaches

Understanding Diffusion-Based Generation

Diffusion models are a powerful type of AI model that generate data - especially images - by learning how to reverse a process of adding random noise. Imagine watching a clean picture gradually become fuzzy like static on a TV. A diffusion model learns to do the opposite: it takes noisy images and reconstructs the original picture by removing noise step by step.

The process involves two main phases:

Forward process (diffusion): gradually adds random noise to an image over many steps, corrupting it into pure noise;
Reverse process (denoising): a neural network learns to remove the noise step by step, reconstructing the original image from the noisy version.

Diffusion models are known for their ability to produce high-quality, realistic images. Their training is typically more stable compared to models like GANs, which makes them very appealing in modern generative AI.

Denoising Diffusion Probabilistic Models (DDPMs)

Denoising diffusion probabilistic models (DDPMs) are a popular kind of diffusion model that apply probabilistic principles and deep learning to remove noise from images in a step-by-step manner.

Forward Process

In the forward process, we start with a real image $x_0$ and gradually add Gaussian noise over $T$ timesteps:

q(x_t|x_{t-1})= \mathcal{N}(x_t;\sqrt{1-\beta_t}x_{t-1},\beta_tI)

Where:

$x_t$ : noisy version of input at timestep;
$\beta_t$ : small variance schedule controlling how much noise is added;
$\mathcal{N}$ : Gaussian distribution.

We can also express the total noise added up to step as:

q(x_t|x_0)= \mathcal{N}(x_t;\sqrt{\={\alpha}_t}x_0,(1-\={\alpha}_t)I)

Where:

$\=\alpha_t=\prod_{s=1}^t(1-\beta_s)$

Reverse Process

The goal of the model is to learn the reverse of this process. A neural network parameterized by $\theta$ predicts the mean and variance of the denoised distribution:

p_\theta(x_{t-1}|x_t) = \mathcal{N}(x_{t-1};\mu_\theta(x_t,t), \Sigma_\theta(x_t,t))

where:

$x_t$ : noisy image at time step $t$ ;
$x_{t-1}$ : predicted less noisy image at step $t-1$ ;
$\mu_\theta$ : predicted mean from the neural network;
$\Sigma_\theta$ : predicted variance from the neural network.

Loss Function

Training involves minimizing the difference between the actual noise and the model's predicted noise using the following objective:

L_{simple} = \mathbb{E}_{x_0, \epsilon, t} \left[ ||\epsilon - \epsilon_0 \left( \sqrt{\=\alpha_t}x_0 + \sqrt{1-\=\alpha_t}\epsilon, t \right)||^2 \right]

where:

$x_t$ : original input image;
$\epsilon$ : random Gaussian noise;
$t$ : time step during diffusion;
$\epsilon_\theta$ : neural network prediction of noise;
$\={\alpha}_t$ : Product of noise schedule parameters up to step $t$ .

This helps the model become better at denoising, improving its ability to generate realistic data.

Score-Based Generative Modeling

Score-based models are another class of diffusion models. Instead of learning the reverse noise process directly, they learn the score function:

\nabla_x\log{p(x)}

where:

$\nabla_x\log{p(x)}$ : the gradient of the log-probability density with respect to input $x$ . This points in the direction of increasing likelihood under the data distribution;
$p(x)$ : the probability distribution of the data.

This function tells the model in which direction the image should move to become more like real data. These models then use a sampling method like Langevin dynamics to gradually move noisy data toward high-probability data regions.

Score-based models often work in continuous time using stochastic differential equations (SDEs). This continuous approach provides flexibility and can produce high-quality generations across various data types.

Applications in High-Resolution Image Generation

Diffusion models have revolutionized generative tasks, especially in high-resolution visual generation. Notable applications include:

Stable Diffusion: a latent diffusion model that generates images from text prompts. It combines a U-Net-based denoising model with a variational autoencoder (VAE) to operate in latent space;
DALL·E 2: combines CLIP embeddings and diffusion-based decoding to generate highly realistic and semantic images from text;
MidJourney: A diffusion-based image generation platform known for producing high-quality, artistically styled visuals from abstract or creative prompts.

These models are used in art generation, photorealistic synthesis, inpainting, super-resolution, and more.

Summary

Diffusion models define a new era of generative modeling by treating data generation as a reverse-time stochastic process. Through DDPMs and score-based models, they achieve robust training, high sample quality, and compelling results across diverse modalities. Their grounding in probabilistic and thermodynamic principles makes them both mathematically elegant and practically powerful.

1. What is the main idea behind diffusion-based generative models?

2. What does the DDPM forward process use to add noise at each step?

3. Which of the following best describes the role of the score function $\nabla_x\log{p(x)}$ in score-based generative modeling?

What is the main idea behind diffusion-based generative models?

Select the correct answer

Reconstructing data by reversing a gradual noising process.

Compressing data using autoencoders

Generating data by adding noise to pure randomness

Sampling from a latent distribution directly

What does the DDPM forward process use to add noise at each step?

Select the correct answer

Uniform distribution

Gaussian distribution with fixed variance

Gaussian distribution with a scheduled variance $\beta_t$

Bernoulli distribution with learnable probability

Which of the following best describes the role of the score function $\nabla_x\log{p(x)}$ in score-based generative modeling?

Select the correct answer

It estimates the mean of the distribution.

It defines the amount of noise added during training.

It compresses data into latent variables.

It guides the data toward high-probability regions during sampling.

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 9

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Ask me questions about this topic

Summarize this chapter

Show real-world examples

Awesome!

Completion rate improved to 4.55

Diffusion Models and Probabilistic Generative Approaches

Swipe to show menu

Understanding Diffusion-Based Generation

The process involves two main phases:

Forward process (diffusion): gradually adds random noise to an image over many steps, corrupting it into pure noise;
Reverse process (denoising): a neural network learns to remove the noise step by step, reconstructing the original image from the noisy version.

Denoising Diffusion Probabilistic Models (DDPMs)

Denoising diffusion probabilistic models (DDPMs) are a popular kind of diffusion model that apply probabilistic principles and deep learning to remove noise from images in a step-by-step manner.

Forward Process

In the forward process, we start with a real image $x_0$ and gradually add Gaussian noise over $T$ timesteps:

q(x_t|x_{t-1})= \mathcal{N}(x_t;\sqrt{1-\beta_t}x_{t-1},\beta_tI)

Where:

$x_t$ : noisy version of input at timestep;
$\beta_t$ : small variance schedule controlling how much noise is added;
$\mathcal{N}$ : Gaussian distribution.

We can also express the total noise added up to step as:

q(x_t|x_0)= \mathcal{N}(x_t;\sqrt{\={\alpha}_t}x_0,(1-\={\alpha}_t)I)

Where:

$\=\alpha_t=\prod_{s=1}^t(1-\beta_s)$

Reverse Process

The goal of the model is to learn the reverse of this process. A neural network parameterized by $\theta$ predicts the mean and variance of the denoised distribution:

p_\theta(x_{t-1}|x_t) = \mathcal{N}(x_{t-1};\mu_\theta(x_t,t), \Sigma_\theta(x_t,t))

where:

$x_t$ : noisy image at time step $t$ ;
$x_{t-1}$ : predicted less noisy image at step $t-1$ ;
$\mu_\theta$ : predicted mean from the neural network;
$\Sigma_\theta$ : predicted variance from the neural network.

Loss Function

Training involves minimizing the difference between the actual noise and the model's predicted noise using the following objective:

L_{simple} = \mathbb{E}_{x_0, \epsilon, t} \left[ ||\epsilon - \epsilon_0 \left( \sqrt{\=\alpha_t}x_0 + \sqrt{1-\=\alpha_t}\epsilon, t \right)||^2 \right]

where:

$x_t$ : original input image;
$\epsilon$ : random Gaussian noise;
$t$ : time step during diffusion;
$\epsilon_\theta$ : neural network prediction of noise;
$\={\alpha}_t$ : Product of noise schedule parameters up to step $t$ .

This helps the model become better at denoising, improving its ability to generate realistic data.

Score-Based Generative Modeling

Score-based models are another class of diffusion models. Instead of learning the reverse noise process directly, they learn the score function:

\nabla_x\log{p(x)}

where:

$\nabla_x\log{p(x)}$ : the gradient of the log-probability density with respect to input $x$ . This points in the direction of increasing likelihood under the data distribution;
$p(x)$ : the probability distribution of the data.

Applications in High-Resolution Image Generation

Diffusion models have revolutionized generative tasks, especially in high-resolution visual generation. Notable applications include:

Stable Diffusion: a latent diffusion model that generates images from text prompts. It combines a U-Net-based denoising model with a variational autoencoder (VAE) to operate in latent space;
DALL·E 2: combines CLIP embeddings and diffusion-based decoding to generate highly realistic and semantic images from text;
MidJourney: A diffusion-based image generation platform known for producing high-quality, artistically styled visuals from abstract or creative prompts.

These models are used in art generation, photorealistic synthesis, inpainting, super-resolution, and more.

Summary

1. What is the main idea behind diffusion-based generative models?

2. What does the DDPM forward process use to add noise at each step?

3. Which of the following best describes the role of the score function $\nabla_x\log{p(x)}$ in score-based generative modeling?

What is the main idea behind diffusion-based generative models?

Select the correct answer

Reconstructing data by reversing a gradual noising process.

Compressing data using autoencoders

Generating data by adding noise to pure randomness

Sampling from a latent distribution directly

What does the DDPM forward process use to add noise at each step?

Select the correct answer

Uniform distribution

Gaussian distribution with fixed variance

Gaussian distribution with a scheduled variance $\beta_t$

Bernoulli distribution with learnable probability

Which of the following best describes the role of the score function $\nabla_x\log{p(x)}$ in score-based generative modeling?

Select the correct answer

It estimates the mean of the distribution.

It defines the amount of noise added during training.

It compresses data into latent variables.

It guides the data toward high-probability regions during sampling.

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 9

Diffusion Models and Probabilistic Generative Approaches

Understanding Diffusion-Based Generation

Denoising Diffusion Probabilistic Models (DDPMs)

Forward Process

Reverse Process

Loss Function

Score-Based Generative Modeling

Applications in High-Resolution Image Generation

Summary

1. What is the main idea behind diffusion-based generative models?

2. What does the DDPM forward process use to add noise at each step?

3. Which of the following best describes the role of the score function ∇xlog⁡p(x)\nabla_x\log{p(x)}∇x​logp(x) in score-based generative modeling?

Awesome!

Diffusion Models and Probabilistic Generative Approaches

Understanding Diffusion-Based Generation

Denoising Diffusion Probabilistic Models (DDPMs)

Forward Process

Reverse Process

Loss Function

Score-Based Generative Modeling

Applications in High-Resolution Image Generation

Summary

1. What is the main idea behind diffusion-based generative models?

2. What does the DDPM forward process use to add noise at each step?

3. Which of the following best describes the role of the score function ∇xlog⁡p(x)\nabla_x\log{p(x)}∇x​logp(x) in score-based generative modeling?

3. Which of the following best describes the role of the score function $\nabla_x\log{p(x)}$ in score-based generative modeling?

3. Which of the following best describes the role of the score function $\nabla_x\log{p(x)}$ in score-based generative modeling?