Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Generative Adversarial Networks (GANs) | Theoretical Foundations
Generative AI
course content

Зміст курсу

Generative AI

Generative AI

1. Introduction to Generative AI
2. Theoretical Foundations
3. Building and Training Generative Models
4. Applications of Generative AI
5. Ethical and Societal Implications
6. Future Trends and Challenges

book
Generative Adversarial Networks (GANs)

Generative Adversarial Networks (GANs) are a class of generative models introduced by Ian Goodfellow in 2014. They consist of two neural networks — the Generator and the Discriminator — trained simultaneously in a game-theoretic framework. The generator tries to produce data that resembles the real data, while the discriminator tries to distinguish real data from the generated data.

GANs learn to generate data samples from noise by solving a minimax game. Over the course of training, the generator becomes better at producing realistic data, and the discriminator becomes better at distinguishing real from fake data.

Architecture of a GAN

A basic GAN model consists of two core components:

1. Generator (G)

  • Takes a random noise vector zpz(z)z \sim p_z(z) as input;
  • Transforms it through a neural network into a data sample G(z)G(z) intended to resemble data from the true distribution.

2. Discriminator (D)

  • Takes either a real data sample xpx(x)x \sim p_x(x) or a generated sample G(z)G(z);
  • Outputs a scalar between 0 and 1, estimating the probability that the input is real.

These two components are trained simultaneously. The generator aims to produce realistic samples to fool the discriminator, while the discriminator aims to correctly identify real versus generated samples.

Minimax Game of GANs

At the heart of GANs lies the minimax game, a concept from game theory. In this setup:

  • The generator GG and discriminator DD are competing players;
  • DD aims to maximize its ability to distinguish real from generated data;
  • GG aims to minimize ability of DD to detect its fake data.

This dynamic defines a zero-sum game, where one player's gain is the other's loss. The optimization is defined as:

minGmaxDV(D,G)=Expx[logD(x)]+Ezpz[log(1D(G(z)))]\underset{G}{\min} \, \underset{D}{\max} \, V(D, G) = \mathbb{E}_{\mathbf{x} \sim p_{x}}[\log D(\mathbf{x})] + \mathbb{E}_{\mathbf{z} \sim p_z}[\log(1 - D(G(\mathbf{z})))]

The generator tries to fool the discriminator by generating samples $$G(z) that are as close to real data as possible.

Loss Functions

While the original GAN objective defines a minimax game, in practice, alternative loss functions are used to stabilize training.

  • Non-saturating Generator Loss:
LG=Ezpz[logD(G(z))]L_G=-\mathbb{E}_{z \sim p_z}[\log{D(G(z))}]

This helps the generator receive strong gradients even when the discriminator performs well.

  • Discriminator Loss:
LD=Expx[logD(x)]Ezpz[log(1D(G(z)))]L_D = -\mathbb{E}_{x \sim p_x}[\log{D(x)}] - \mathbb{E}_{z \sim p_z}[\log{(1-D(G(z)))}]

These losses encourage the generator to produce samples that increase the discriminator’s uncertainty and improve convergence during training.

Key Variants of GAN Architectures

Several types of GANs have emerged to tackle specific limitations or to improve performance:

Conditional GAN (cGAN)

Conditional GANs extend the standard GAN framework by introducing additional information (usually labels) into both the generator and the discriminator. Instead of generating data from random noise alone, the generator receives both noise zz and a condition yy (e.g., a class label). The discriminator also receives yy to judge if the sample is realistic under that condition.

  • Use cases: class-conditional image generation, image-to-image translation text-to-image generation.

Deep Convolutional GAN (DCGAN)

DCGANs replace the fully connected layers in the original GANs with convolutional and transposed convolutional layers, making them more effective for generating images. They also introduce architectural guidelines like removing fully connected layers, using batch normalization, and employing ReLU/LeakyReLU activations.

  • Use cases: photo-realistic image generation, learning visual representations, unsupervised feature learning.

CycleGAN CycleGANs address the problem of unpaired image-to-image translation. Unlike other models that require paired datasets (e.g., the same photo in two different styles), CycleGANs can learn mappings between two domains without paired examples. They introduce two generators and two discriminators, each responsible for mapping in one direction (e.g., photos to paintings and vice versa), and enforce a cycle-consistency loss to ensure that translating from one domain and back returns the original image. This loss is key to preserving content and structure.

Cycle-Consistency Loss ensures:

GBA(GAB(x))x and GAB(GBA(y))yG_{BA}(G_{AB}(x)) \approx x\ \text{and}\ G_{AB}(G_{BA}(y))\approx y

where:

  • GABG_{AB} maps images from domain A to domain B;
  • GBAG_{BA} maps from domain B to domain A.
  • xA,yBx \in A, y \in B.

Use Cases: photo to artwork conversion, horse-to-zebra translation, voice conversion between speakers.

StyleGAN

StyleGAN, developed by NVIDIA, introduces style-based control into the generator. Instead of feeding a noise vector directly to the generator, it passes through a mapping network to produce "style vectors" that influence each layer of the generator. This enables fine control over visual features such as hair color, facial expressions, or lighting.

Notable innovations:

  • Style mixing, allows combining multiple latent codes;
  • Adaptive Instance Normalization (AdaIN), controls feature maps in the generator;
  • Progressive growing, training starts at low resolution and increases over time.

Use cases: ultra high-resolution image generation (e.g., faces), visual attribute control, art generation.

Comparison: GANs vs VAEs

GANs are a powerful class of generative models capable of producing highly realistic data through an adversarial training process. Their core lies in a minimax game between two networks, using adversarial losses to iteratively improve both components. A solid grasp of their architecture, loss functions—including variants like cGAN, DCGAN, CycleGAN, and StyleGAN—and their contrast with other models like VAEs equips practitioners with the necessary foundation for applications in fields such as image generation, video synthesis, data augmentation, and more.

1. Which of the following best describes the components of a basic GAN architecture?

2. What is the goal of the minimax game in GANs?

3. Which of the following statements is true about the difference between GANs and VAEs?

question mark

Which of the following best describes the components of a basic GAN architecture?

Select the correct answer

question mark

What is the goal of the minimax game in GANs?

Select the correct answer

question mark

Which of the following statements is true about the difference between GANs and VAEs?

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 2. Розділ 7
Ми дуже хвилюємося, що щось пішло не так. Що трапилося?
some-alt