Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Diffusion Models and Probabilistic Generative Approaches | Theoretical Foundations
Generative AI
course content

Зміст курсу

Generative AI

Generative AI

1. Introduction to Generative AI
2. Theoretical Foundations
3. Building and Training Generative Models
4. Ethical, Regulatory, and Future Perspectives in Generative AI

book
Diffusion Models and Probabilistic Generative Approaches

Understanding Diffusion-Based Generation

Diffusion models are a powerful type of AI model that generate data - especially images - by learning how to reverse a process of adding random noise. Imagine watching a clean picture gradually become fuzzy like static on a TV. A diffusion model learns to do the opposite: it takes noisy images and reconstructs the original picture by removing noise step by step.

The process involves two main phases:

  • Forward process (diffusion): gradually adds random noise to an image over many steps, corrupting it into pure noise;

  • Reverse process (denoising): a neural network learns to remove the noise step by step, reconstructing the original image from the noisy version.

Diffusion models are known for their ability to produce high-quality, realistic images. Their training is typically more stable compared to models like GANs, which makes them very appealing in modern generative AI.

Denoising Diffusion Probabilistic Models (DDPMs)

Denoising diffusion probabilistic models (DDPMs) are a popular kind of diffusion model that apply probabilistic principles and deep learning to remove noise from images in a step-by-step manner.

Forward Process

In the forward process, we start with a real image x0x_0 and gradually add Gaussian noise over TT timesteps:

q(xtxt1)=N(xt;1βtxt1,βtI)q(x_t|x_{t-1})= \mathcal{N}(x_t;\sqrt{1-\beta_t}x_{t-1},\beta_tI)

Where:

  • xtx_t: noisy version of input at timestep;

  • βt\beta_t: small variance schedule controlling how much noise is added;

  • N\mathcal{N}: Gaussian distribution.

We can also express the total noise added up to step as:

q(xtx0)=N(xt;αˉtx0,(1αˉt)I)q(x_t|x_0)= \mathcal{N}(x_t;\sqrt{\={\alpha}_t}x_0,(1-\={\alpha}_t)I)

Where:

  • αˉt=s=1t(1βs)\=\alpha_t=\prod_{s=1}^t(1-\beta_s)

Reverse Process

The goal of the model is to learn the reverse of this process. A neural network parameterized by θ\theta predicts the mean and variance of the denoised distribution:

pθ(xt1xt)=N(xt1;μθ(xt,t),Σθ(xt,t))p_\theta(x_{t-1}|x_t) = \mathcal{N}(x_{t-1};\mu_\theta(x_t,t), \Sigma_\theta(x_t,t))

where:

  • xtx_t: noisy image at time step tt;

  • xt1x_{t-1}: predicted less noisy image at step t1t-1;

  • μθ\mu_\theta: predicted mean from the neural network;

  • Σθ\Sigma_\theta: predicted variance from the neural network.

Loss Function

Training involves minimizing the difference between the actual noise and the model's predicted noise using the following objective:

Lsimple=Ex0,ϵ,t[ϵϵ0(αˉtx0+1αˉtϵ,t)2]L_{simple} = \mathbb{E}_{x_0, \epsilon, t} \left[ ||\epsilon - \epsilon_0 \left( \sqrt{\=\alpha_t}x_0 + \sqrt{1-\=\alpha_t}\epsilon, t \right)||^2 \right]

where:

  • xtx_t: original input image;

  • ϵ\epsilon: random Gaussian noise;

  • tt: time step during diffusion;

  • ϵθ\epsilon_\theta: neural network prediction of noise;

  • αˉt\={\alpha}_t: Product of noise schedule parameters up to step tt.

This helps the model become better at denoising, improving its ability to generate realistic data.

Score-Based Generative Modeling

Score-based models are another class of diffusion models. Instead of learning the reverse noise process directly, they learn the score function:

xlogp(x)\nabla_x\log{p(x)}

where:

  • xlogp(x)\nabla_x\log{p(x)}: the gradient of the log-probability density with respect to input xx. This points in the direction of increasing likelihood under the data distribution;

  • p(x)p(x): the probability distribution of the data.

This function tells the model in which direction the image should move to become more like real data. These models then use a sampling method like Langevin dynamics to gradually move noisy data toward high-probability data regions.

Score-based models often work in continuous time using stochastic differential equations (SDEs). This continuous approach provides flexibility and can produce high-quality generations across various data types.

Applications in High-Resolution Image Generation

Diffusion models have revolutionized generative tasks, especially in high-resolution visual generation. Notable applications include:

  • Stable Diffusion: a latent diffusion model that generates images from text prompts. It combines a U-Net-based denoising model with a variational autoencoder (VAE) to operate in latent space;

  • DALL·E 2: combines CLIP embeddings and diffusion-based decoding to generate highly realistic and semantic images from text;

  • MidJourney: A diffusion-based image generation platform known for producing high-quality, artistically styled visuals from abstract or creative prompts.

These models are used in art generation, photorealistic synthesis, inpainting, super-resolution, and more.

Summary

Diffusion models define a new era of generative modeling by treating data generation as a reverse-time stochastic process. Through DDPMs and score-based models, they achieve robust training, high sample quality, and compelling results across diverse modalities. Their grounding in probabilistic and thermodynamic principles makes them both mathematically elegant and practically powerful.

1. What is the main idea behind diffusion-based generative models?

2. What does the DDPM forward process use to add noise at each step?

3. Which of the following best describes the role of the score function xlogp(x)\nabla_x\log{p(x)} in score-based generative modeling?

question mark

What is the main idea behind diffusion-based generative models?

Select the correct answer

question mark

What does the DDPM forward process use to add noise at each step?

Select the correct answer

question mark

Which of the following best describes the role of the score function xlogp(x)\nabla_x\log{p(x)} in score-based generative modeling?

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 2. Розділ 9

Запитати АІ

expand
ChatGPT

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

course content

Зміст курсу

Generative AI

Generative AI

1. Introduction to Generative AI
2. Theoretical Foundations
3. Building and Training Generative Models
4. Ethical, Regulatory, and Future Perspectives in Generative AI

book
Diffusion Models and Probabilistic Generative Approaches

Understanding Diffusion-Based Generation

Diffusion models are a powerful type of AI model that generate data - especially images - by learning how to reverse a process of adding random noise. Imagine watching a clean picture gradually become fuzzy like static on a TV. A diffusion model learns to do the opposite: it takes noisy images and reconstructs the original picture by removing noise step by step.

The process involves two main phases:

  • Forward process (diffusion): gradually adds random noise to an image over many steps, corrupting it into pure noise;

  • Reverse process (denoising): a neural network learns to remove the noise step by step, reconstructing the original image from the noisy version.

Diffusion models are known for their ability to produce high-quality, realistic images. Their training is typically more stable compared to models like GANs, which makes them very appealing in modern generative AI.

Denoising Diffusion Probabilistic Models (DDPMs)

Denoising diffusion probabilistic models (DDPMs) are a popular kind of diffusion model that apply probabilistic principles and deep learning to remove noise from images in a step-by-step manner.

Forward Process

In the forward process, we start with a real image x0x_0 and gradually add Gaussian noise over TT timesteps:

q(xtxt1)=N(xt;1βtxt1,βtI)q(x_t|x_{t-1})= \mathcal{N}(x_t;\sqrt{1-\beta_t}x_{t-1},\beta_tI)

Where:

  • xtx_t: noisy version of input at timestep;

  • βt\beta_t: small variance schedule controlling how much noise is added;

  • N\mathcal{N}: Gaussian distribution.

We can also express the total noise added up to step as:

q(xtx0)=N(xt;αˉtx0,(1αˉt)I)q(x_t|x_0)= \mathcal{N}(x_t;\sqrt{\={\alpha}_t}x_0,(1-\={\alpha}_t)I)

Where:

  • αˉt=s=1t(1βs)\=\alpha_t=\prod_{s=1}^t(1-\beta_s)

Reverse Process

The goal of the model is to learn the reverse of this process. A neural network parameterized by θ\theta predicts the mean and variance of the denoised distribution:

pθ(xt1xt)=N(xt1;μθ(xt,t),Σθ(xt,t))p_\theta(x_{t-1}|x_t) = \mathcal{N}(x_{t-1};\mu_\theta(x_t,t), \Sigma_\theta(x_t,t))

where:

  • xtx_t: noisy image at time step tt;

  • xt1x_{t-1}: predicted less noisy image at step t1t-1;

  • μθ\mu_\theta: predicted mean from the neural network;

  • Σθ\Sigma_\theta: predicted variance from the neural network.

Loss Function

Training involves minimizing the difference between the actual noise and the model's predicted noise using the following objective:

Lsimple=Ex0,ϵ,t[ϵϵ0(αˉtx0+1αˉtϵ,t)2]L_{simple} = \mathbb{E}_{x_0, \epsilon, t} \left[ ||\epsilon - \epsilon_0 \left( \sqrt{\=\alpha_t}x_0 + \sqrt{1-\=\alpha_t}\epsilon, t \right)||^2 \right]

where:

  • xtx_t: original input image;

  • ϵ\epsilon: random Gaussian noise;

  • tt: time step during diffusion;

  • ϵθ\epsilon_\theta: neural network prediction of noise;

  • αˉt\={\alpha}_t: Product of noise schedule parameters up to step tt.

This helps the model become better at denoising, improving its ability to generate realistic data.

Score-Based Generative Modeling

Score-based models are another class of diffusion models. Instead of learning the reverse noise process directly, they learn the score function:

xlogp(x)\nabla_x\log{p(x)}

where:

  • xlogp(x)\nabla_x\log{p(x)}: the gradient of the log-probability density with respect to input xx. This points in the direction of increasing likelihood under the data distribution;

  • p(x)p(x): the probability distribution of the data.

This function tells the model in which direction the image should move to become more like real data. These models then use a sampling method like Langevin dynamics to gradually move noisy data toward high-probability data regions.

Score-based models often work in continuous time using stochastic differential equations (SDEs). This continuous approach provides flexibility and can produce high-quality generations across various data types.

Applications in High-Resolution Image Generation

Diffusion models have revolutionized generative tasks, especially in high-resolution visual generation. Notable applications include:

  • Stable Diffusion: a latent diffusion model that generates images from text prompts. It combines a U-Net-based denoising model with a variational autoencoder (VAE) to operate in latent space;

  • DALL·E 2: combines CLIP embeddings and diffusion-based decoding to generate highly realistic and semantic images from text;

  • MidJourney: A diffusion-based image generation platform known for producing high-quality, artistically styled visuals from abstract or creative prompts.

These models are used in art generation, photorealistic synthesis, inpainting, super-resolution, and more.

Summary

Diffusion models define a new era of generative modeling by treating data generation as a reverse-time stochastic process. Through DDPMs and score-based models, they achieve robust training, high sample quality, and compelling results across diverse modalities. Their grounding in probabilistic and thermodynamic principles makes them both mathematically elegant and practically powerful.

1. What is the main idea behind diffusion-based generative models?

2. What does the DDPM forward process use to add noise at each step?

3. Which of the following best describes the role of the score function xlogp(x)\nabla_x\log{p(x)} in score-based generative modeling?

question mark

What is the main idea behind diffusion-based generative models?

Select the correct answer

question mark

What does the DDPM forward process use to add noise at each step?

Select the correct answer

question mark

Which of the following best describes the role of the score function xlogp(x)\nabla_x\log{p(x)} in score-based generative modeling?

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 2. Розділ 9
Ми дуже хвилюємося, що щось пішло не так. Що трапилося?
some-alt