Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära Diffusion Models and Probabilistic Generative Approaches | Theoretical Foundations
Generative AI
course content

Kursinnehåll

Generative AI

Generative AI

1. Introduction to Generative AI
2. Theoretical Foundations
3. Building and Training Generative Models
4. Applications of Generative AI
5. Ethical and Societal Implications
6. Future Trends and Challenges

book
Diffusion Models and Probabilistic Generative Approaches

Understanding Diffusion-Based Generation

Diffusion models are a powerful class of generative models that synthesize data by learning to reverse a gradual noising process. They are inspired by nonequilibrium thermodynamics and model data generation as the reversal of a diffusion process where noise is slowly added to data over time.

The generative process begins from pure noise and learns to iteratively denoise it back into structured data. Conceptually, the process consists of two phases:

  • Forward process (Diffusion): gradually add Gaussian noise to an image over time steps;
  • Reverse process (Denoising): train a neural network to remove this noise step-by-step, reconstructing the original data.

This method allows for highly expressive and stable generation, particularly for high-resolution images.

Denoising Diffusion Probabilistic Models (DDPMs)

Denoising Diffusion Probabilistic Models (DDPMs) formalize this idea using a Markov chain to model the forward and reverse processes.

Forward Process

Given a data point x0q(x)x_0 \sim q(x), noise is added step-by-step:

q(xtxt1)=N(xt;1βtxt1,βtI)q(x_t|x_{t-1})= \mathcal{N}(x_t;\sqrt{1-\beta_t}x_{t-1},\beta_tI)

Where:

  • xtx_t: noisy version of input at timestep;
  • βt\beta_t: small variance schedule controlling how much noise is added;
  • N\mathcal{N}: Gaussian distribution.

The entire process can be simplified:

q(xtx0)=N(xt;αˉtx0,(1αˉt)I)q(x_t|x_0)= \mathcal{N}(x_t;\sqrt{\={\alpha}_t}x_0,(1-\={\alpha}_t)I)

Where:

  • αˉt=s=1t(1βs)\=\alpha_t=\prod_{s=1}^t(1-\beta_s)

Loss Function

A simplified and widely used objective minimizes the difference between predicted noise and true noise:

Lsimple=Ex0,ϵ,t[ϵϵ0(αˉtx0+1αˉtϵ,t)2]L_{simple} = \mathbb{E}_{x_0, \epsilon, t} \left[ ||\epsilon - \epsilon_0 \left( \sqrt{\=\alpha_t}x_0 + \sqrt{1-\=\alpha_t}\epsilon, t \right)||^2 \right]

This framework has enabled state-of-the-art image generation results.

Score-Based Generative Modeling

Score-based models approach generative modeling by learning the gradient (or "score") of the log data density:

xlogp(x)\nabla_x\log{p(x)}

Instead of modeling likelihood directly, they learn to denoise data using score matching. This is particularly useful when combined with Langevin dynamics, a sampling method that iteratively refines data toward high-likelihood regions.

Score-based diffusion models share similarities with DDPMs but often involve continuous-time stochastic differential equations (SDEs) instead of discrete steps.

Applications in High-Resolution Image Generation

Diffusion models have revolutionized generative tasks, especially in high-resolution visual generation. Notable applications include:

  • Stable Diffusion: a latent diffusion model that generates images from text prompts. It combines a U-Net-based denoising model with a variational autoencoder (VAE) to operate in latent space.
  • DALL·E 2: combines CLIP embeddings and diffusion-based decoding to generate highly realistic and semantic images from text.
  • MidJourney: A diffusion-based image generation platform known for producing high-quality, artistically styled visuals from abstract or creative prompts.

These models are used in art generation, photorealistic synthesis, inpainting, super-resolution, and more.

Summary

Diffusion models define a new era of generative modeling by treating data generation as a reverse-time stochastic process. Through DDPMs and score-based models, they achieve robust training, high sample quality, and compelling results across diverse modalities. Their grounding in probabilistic and thermodynamic principles makes them both mathematically elegant and practically powerful.

1. What is the main idea behind diffusion-based generative models?

2. What does the DDPM forward process use to add noise at each step?

3. Which of the following best describes the role of the score function xlogp(x)\nabla_x\log{p(x)} in score-based generative modeling?

question mark

What is the main idea behind diffusion-based generative models?

Select the correct answer

question mark

What does the DDPM forward process use to add noise at each step?

Select the correct answer

question mark

Which of the following best describes the role of the score function xlogp(x)\nabla_x\log{p(x)} in score-based generative modeling?

Select the correct answer

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 2. Kapitel 9
Vi beklagar att något gick fel. Vad hände?
some-alt