Course Content
Generative AI
Generative AI
Training and Optimization
Training generative models involves optimizing often unstable and complex loss landscapes. This section introduces loss functions tailored to each model type, optimization strategies to stabilize training, and methods for fine-tuning pretrained models for custom use cases.
Core Loss Functions
Different generative model families use distinct loss formulations depending on how they model data distributions.
GAN Losses
Minimax loss (original GAN)
Adversarial setup between generator and discriminator (example with pythorch
library):
Least squares GAN (LSGAN)
Uses L2 loss instead of log loss to improve stability and gradient flow:
Wasserstein GAN (WGAN)
Minimizes Earth Mover (EM) distance; replaces discriminator with a "critic" and uses weight clipping or gradient penalty for Lipschitz continuity:
VAE Loss
Evidence Lower Bound (ELBO)
Combines reconstruction and regularization. The KL divergence term encourages the latent posterior to remain close to the prior (usually standard normal):
Diffusion Model Losses
Noise Prediction Loss
Models learn to denoise added Gaussian noise across a diffusion schedule. Variants use velocity prediction (e.g., v-prediction in Stable Diffusion v2) or hybrid objectives:
Optimization Techniques
Training generative models is often unstable and sensitive to hyperparameters. Several techniques are employed to ensure convergence and quality.
Optimizers and Schedulers
Adam / AdamW: adaptive gradient optimizers are the de facto standard. Use for GANs;
RMSprop: sometimes used in WGAN variants;
Learning rate scheduling:
Warm-up phases for transformers and diffusion models;
Cosine decay or ReduceLROnPlateau for stable convergence.
Stabilization Methods
Gradient clipping: avoid exploding gradients in RNNs or deep UNets;
Spectral normalization: applied to discriminator layers in GANs to enforce Lipschitz constraints;
Label smoothing: softens hard labels (e.g., real = 0.9 instead of 1.0) to reduce overconfidence;
Two-time-scale update rule (TTUR): use different learning rates for generator and discriminator to improve convergence;
Mixed-precision training: leverages FP16 (via NVIDIA Apex or PyTorch AMP) for faster training on modern GPUs.
Monitor both generator and discriminator losses separately. Use metrics like FID or IS periodically to evaluate actual output quality rather than relying solely on loss values.
Fine-Tuning Pretrained Generative Models
Pretrained generative models (e.g., Stable Diffusion, LLaMA, StyleGAN2) can be fine-tuned for domain-specific tasks using lighter training strategies.
Transfer Learning Techniques
Full fine-tuning: re-train all model weights. High compute cost but maximal flexibility;
Layer re-freezing / gradual unfreezing: start by freezing most layers, then gradually unfreeze selected layers for better fine-tuning. This avoids catastrophic forgetting. Freezing early layers helps keep general features from pretraining (like edges or word patterns), while unfreezing later ones lets the model learn task-specific features;
LoRA / adapter layers: inject low-rank trainable layers without updating base model parameters;
DreamBooth / textual inversion (diffusion models):
Fine-tune on a handful of subject-specific images.
Use
diffusers
pipeline:
Prompt tuning / p-tuning:
Common Use Cases
Style adaptation: fine-tuning on anime, comic, or artistic datasets;
Industry-specific tuning: adapting LLMs to legal, medical, or enterprise domains;
Personalization: custom identity or voice conditioning using small reference sets.
Use Hugging Face PEFT for LoRA/adapter-based methods, and Diffusers library for lightweight fine-tuning pipelines with built-in support for DreamBooth and classifier-free guidance.
Summary
Use model-specific loss functions that match training objectives and model structure;
Optimize with adaptive methods, stabilization techniques, and efficient scheduling;
Fine-tune pretrained models using modern low-rank or prompt-based transfer strategies to reduce cost and increase domain adaptability.
1. Which of the following is a primary purpose of using regularization techniques during training?
2. Which of the following optimizers is commonly used for training deep learning models and adapts the learning rate during training?
3. What is the primary challenge when training generative models, especially in the context of GANs (Generative Adversarial Networks)?
Thanks for your feedback!