Kursinnhold
Generative AI
Generative AI
Diffusion Models and Probabilistic Generative Approaches
Understanding Diffusion-Based Generation
Diffusion models are a powerful class of generative models that synthesize data by learning to reverse a gradual noising process. They are inspired by nonequilibrium thermodynamics and model data generation as the reversal of a diffusion process where noise is slowly added to data over time.
The generative process begins from pure noise and learns to iteratively denoise it back into structured data. Conceptually, the process consists of two phases:
- Forward process (Diffusion): gradually add Gaussian noise to an image over time steps;
- Reverse process (Denoising): train a neural network to remove this noise step-by-step, reconstructing the original data.
This method allows for highly expressive and stable generation, particularly for high-resolution images.
Denoising Diffusion Probabilistic Models (DDPMs)
Denoising Diffusion Probabilistic Models (DDPMs) formalize this idea using a Markov chain to model the forward and reverse processes.
Forward Process
Given a data point , noise is added step-by-step:
Where:
- : noisy version of input at timestep;
- : small variance schedule controlling how much noise is added;
- : Gaussian distribution.
The entire process can be simplified:
Where:
Loss Function
A simplified and widely used objective minimizes the difference between predicted noise and true noise:
This framework has enabled state-of-the-art image generation results.
Score-Based Generative Modeling
Score-based models approach generative modeling by learning the gradient (or "score") of the log data density:
Instead of modeling likelihood directly, they learn to denoise data using score matching. This is particularly useful when combined with Langevin dynamics, a sampling method that iteratively refines data toward high-likelihood regions.
Score-based diffusion models share similarities with DDPMs but often involve continuous-time stochastic differential equations (SDEs) instead of discrete steps.
Applications in High-Resolution Image Generation
Diffusion models have revolutionized generative tasks, especially in high-resolution visual generation. Notable applications include:
- Stable Diffusion: a latent diffusion model that generates images from text prompts. It combines a U-Net-based denoising model with a variational autoencoder (VAE) to operate in latent space.
- DALL·E 2: combines CLIP embeddings and diffusion-based decoding to generate highly realistic and semantic images from text.
- MidJourney: A diffusion-based image generation platform known for producing high-quality, artistically styled visuals from abstract or creative prompts.
These models are used in art generation, photorealistic synthesis, inpainting, super-resolution, and more.
Summary
Diffusion models define a new era of generative modeling by treating data generation as a reverse-time stochastic process. Through DDPMs and score-based models, they achieve robust training, high sample quality, and compelling results across diverse modalities. Their grounding in probabilistic and thermodynamic principles makes them both mathematically elegant and practically powerful.
1. What is the main idea behind diffusion-based generative models?
2. What does the DDPM forward process use to add noise at each step?
3. Which of the following best describes the role of the score function in score-based generative modeling?
Takk for tilbakemeldingene dine!