Course Content

Generative AI

1. Introduction to Generative AI

What is Generative AI?History and Evolution Types of Generative AI Models

2. Theoretical Foundations

Probability Distributions and Randomness in AI Bayesian Inference and Markov Processes Understanding Information and Optimization in AI Overview of Artificial Neural Networks Recurrent Neural Networks (RNNs) and Sequence Generation Variational Autoencoders (VAEs)Generative Adversarial Networks (GANs)Transformer-Based Generative Models Diffusion Models and Probabilistic Generative Approaches

3. Building and Training Generative Models

Data Collection and Preprocessing Training and Optimization Evaluation Metrics for Generative AI Challenge: Build Simple VAE

4. Ethical, Regulatory, and Future Perspectives in Generative AI

Bias, Fairness, and Representation Deepfakes and Misinformation Intellectual Property and Ownership Sustainability and Scaling Challenges Global Policy and AI Governance

Training and Optimization

Training generative models involves optimizing often unstable and complex loss landscapes. This section introduces loss functions tailored to each model type, optimization strategies to stabilize training, and methods for fine-tuning pretrained models for custom use cases.

Core Loss Functions

Different generative model families use distinct loss formulations depending on how they model data distributions.

GAN Losses

Minimax loss (original GAN)

Adversarial setup between generator $G$ and discriminator $D$ (example with pythorch library):

Least squares GAN (LSGAN)

Uses L2 loss instead of log loss to improve stability and gradient flow:

Wasserstein GAN (WGAN)

Minimizes Earth Mover (EM) distance; replaces discriminator with a "critic" and uses weight clipping or gradient penalty for Lipschitz continuity:

VAE Loss

Evidence Lower Bound (ELBO)

Combines reconstruction and regularization. The KL divergence term encourages the latent posterior to remain close to the prior (usually standard normal):

Diffusion Model Losses

Noise Prediction Loss

Models learn to denoise added Gaussian noise across a diffusion schedule. Variants use velocity prediction (e.g., v-prediction in Stable Diffusion v2) or hybrid objectives:

Optimization Techniques

Training generative models is often unstable and sensitive to hyperparameters. Several techniques are employed to ensure convergence and quality.

Optimizers and Schedulers

Adam / AdamW: adaptive gradient optimizers are the de facto standard. Use $\beta_1=0.5,\ \beta_2=0.999$ for GANs;
RMSprop: sometimes used in WGAN variants;
Learning rate scheduling:
- Warm-up phases for transformers and diffusion models;
- Cosine decay or ReduceLROnPlateau for stable convergence.

Stabilization Methods

Gradient clipping: avoid exploding gradients in RNNs or deep UNets;

Spectral normalization: applied to discriminator layers in GANs to enforce Lipschitz constraints;

Label smoothing: softens hard labels (e.g., real = 0.9 instead of 1.0) to reduce overconfidence;
Two-time-scale update rule (TTUR): use different learning rates for generator and discriminator to improve convergence;
Mixed-precision training: leverages FP16 (via NVIDIA Apex or PyTorch AMP) for faster training on modern GPUs.

Note

Monitor both generator and discriminator losses separately. Use metrics like FID or IS periodically to evaluate actual output quality rather than relying solely on loss values.

Fine-Tuning Pretrained Generative Models

Pretrained generative models (e.g., Stable Diffusion, LLaMA, StyleGAN2) can be fine-tuned for domain-specific tasks using lighter training strategies.

Transfer Learning Techniques

Full fine-tuning: re-train all model weights. High compute cost but maximal flexibility;

Layer re-freezing / gradual unfreezing: start by freezing most layers, then gradually unfreeze selected layers for better fine-tuning. This avoids catastrophic forgetting. Freezing early layers helps keep general features from pretraining (like edges or word patterns), while unfreezing later ones lets the model learn task-specific features;

LoRA / adapter layers: inject low-rank trainable layers without updating base model parameters;

DreamBooth / textual inversion (diffusion models):
- Fine-tune on a handful of subject-specific images.
- Use diffusers pipeline:

Prompt tuning / p-tuning:

Common Use Cases

Style adaptation: fine-tuning on anime, comic, or artistic datasets;
Industry-specific tuning: adapting LLMs to legal, medical, or enterprise domains;
Personalization: custom identity or voice conditioning using small reference sets.

Note

Use Hugging Face PEFT for LoRA/adapter-based methods, and Diffusers library for lightweight fine-tuning pipelines with built-in support for DreamBooth and classifier-free guidance.

Summary

Use model-specific loss functions that match training objectives and model structure;
Optimize with adaptive methods, stabilization techniques, and efficient scheduling;
Fine-tune pretrained models using modern low-rank or prompt-based transfer strategies to reduce cost and increase domain adaptability.

1. Which of the following is a primary purpose of using regularization techniques during training?

2. Which of the following optimizers is commonly used for training deep learning models and adapts the learning rate during training?

3. What is the primary challenge when training generative models, especially in the context of GANs (Generative Adversarial Networks)?

Which of the following is a primary purpose of using regularization techniques during training?

Select the correct answer

To increase the training dataset size.

To prevent overfitting by penalizing overly complex models.

To speed up the learning process.

To reduce computational cost during training.

Which of the following optimizers is commonly used for training deep learning models and adapts the learning rate during training?

Select the correct answer

Stochastic Gradient Descent (SGD)

Adam

Adagrad

Momentum

What is the primary challenge when training generative models, especially in the context of GANs (Generative Adversarial Networks)?

Select the correct answer

Difficulty in balancing the training of the generator and discriminator

Slow convergence due to small gradients

Inability to evaluate model performance quantitatively

Overfitting of the discriminator model

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 3. Chapter 2

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Course Content

Generative AI

1. Introduction to Generative AI

What is Generative AI?History and Evolution Types of Generative AI Models

2. Theoretical Foundations

3. Building and Training Generative Models

Data Collection and Preprocessing Training and Optimization Evaluation Metrics for Generative AI Challenge: Build Simple VAE

4. Ethical, Regulatory, and Future Perspectives in Generative AI

Bias, Fairness, and Representation Deepfakes and Misinformation Intellectual Property and Ownership Sustainability and Scaling Challenges Global Policy and AI Governance

Training and Optimization

Core Loss Functions

Different generative model families use distinct loss formulations depending on how they model data distributions.

GAN Losses

Minimax loss (original GAN)

Adversarial setup between generator $G$ and discriminator $D$ (example with pythorch library):

Least squares GAN (LSGAN)

Uses L2 loss instead of log loss to improve stability and gradient flow:

Wasserstein GAN (WGAN)

Minimizes Earth Mover (EM) distance; replaces discriminator with a "critic" and uses weight clipping or gradient penalty for Lipschitz continuity:

VAE Loss

Evidence Lower Bound (ELBO)

Combines reconstruction and regularization. The KL divergence term encourages the latent posterior to remain close to the prior (usually standard normal):

Diffusion Model Losses

Noise Prediction Loss

Models learn to denoise added Gaussian noise across a diffusion schedule. Variants use velocity prediction (e.g., v-prediction in Stable Diffusion v2) or hybrid objectives:

Optimization Techniques

Training generative models is often unstable and sensitive to hyperparameters. Several techniques are employed to ensure convergence and quality.

Optimizers and Schedulers

Adam / AdamW: adaptive gradient optimizers are the de facto standard. Use $\beta_1=0.5,\ \beta_2=0.999$ for GANs;
RMSprop: sometimes used in WGAN variants;
Learning rate scheduling:
- Warm-up phases for transformers and diffusion models;
- Cosine decay or ReduceLROnPlateau for stable convergence.

Stabilization Methods

Gradient clipping: avoid exploding gradients in RNNs or deep UNets;

Spectral normalization: applied to discriminator layers in GANs to enforce Lipschitz constraints;

Label smoothing: softens hard labels (e.g., real = 0.9 instead of 1.0) to reduce overconfidence;
Two-time-scale update rule (TTUR): use different learning rates for generator and discriminator to improve convergence;
Mixed-precision training: leverages FP16 (via NVIDIA Apex or PyTorch AMP) for faster training on modern GPUs.

Note

Monitor both generator and discriminator losses separately. Use metrics like FID or IS periodically to evaluate actual output quality rather than relying solely on loss values.

Fine-Tuning Pretrained Generative Models

Pretrained generative models (e.g., Stable Diffusion, LLaMA, StyleGAN2) can be fine-tuned for domain-specific tasks using lighter training strategies.

Transfer Learning Techniques

Full fine-tuning: re-train all model weights. High compute cost but maximal flexibility;

Layer re-freezing / gradual unfreezing: start by freezing most layers, then gradually unfreeze selected layers for better fine-tuning. This avoids catastrophic forgetting. Freezing early layers helps keep general features from pretraining (like edges or word patterns), while unfreezing later ones lets the model learn task-specific features;

LoRA / adapter layers: inject low-rank trainable layers without updating base model parameters;

DreamBooth / textual inversion (diffusion models):
- Fine-tune on a handful of subject-specific images.
- Use diffusers pipeline:

Prompt tuning / p-tuning:

Common Use Cases

Style adaptation: fine-tuning on anime, comic, or artistic datasets;
Industry-specific tuning: adapting LLMs to legal, medical, or enterprise domains;
Personalization: custom identity or voice conditioning using small reference sets.

Note

Use Hugging Face PEFT for LoRA/adapter-based methods, and Diffusers library for lightweight fine-tuning pipelines with built-in support for DreamBooth and classifier-free guidance.

Summary

Use model-specific loss functions that match training objectives and model structure;
Optimize with adaptive methods, stabilization techniques, and efficient scheduling;
Fine-tune pretrained models using modern low-rank or prompt-based transfer strategies to reduce cost and increase domain adaptability.

1. Which of the following is a primary purpose of using regularization techniques during training?

2. Which of the following optimizers is commonly used for training deep learning models and adapts the learning rate during training?

3. What is the primary challenge when training generative models, especially in the context of GANs (Generative Adversarial Networks)?

Which of the following is a primary purpose of using regularization techniques during training?

Select the correct answer

To increase the training dataset size.

To prevent overfitting by penalizing overly complex models.

To speed up the learning process.

To reduce computational cost during training.

Which of the following optimizers is commonly used for training deep learning models and adapts the learning rate during training?

Select the correct answer

Stochastic Gradient Descent (SGD)

Adam

Adagrad

Momentum

What is the primary challenge when training generative models, especially in the context of GANs (Generative Adversarial Networks)?

Select the correct answer

Difficulty in balancing the training of the generator and discriminator

Slow convergence due to small gradients

Inability to evaluate model performance quantitatively

Overfitting of the discriminator model

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 3. Chapter 2