Generative Adversarial Networks (GANs)

Generative Adversarial Networks (GANs) are a class of generative models introduced by Ian Goodfellow in 2014. They consist of two neural networks — the Generator and the Discriminator — trained simultaneously in a game-theoretic framework. The generator tries to produce data that resembles the real data, while the discriminator tries to distinguish real data from the generated data.

GANs learn to generate data samples from noise by solving a minimax game. Over the course of training, the generator becomes better at producing realistic data, and the discriminator becomes better at distinguishing real from fake data.

Architecture of a GAN

A basic GAN model consists of two core components:

1. Generator (G)

Takes a random noise vector $z \sim p_z(z)$ as input;
Transforms it through a neural network into a data sample $G(z)$ intended to resemble data from the true distribution.

2. Discriminator (D)

Takes either a real data sample $x \sim p_x(x)$ or a generated sample $G(z)$ ;
Outputs a scalar between 0 and 1, estimating the probability that the input is real.

These two components are trained simultaneously. The generator aims to produce realistic samples to fool the discriminator, while the discriminator aims to correctly identify real versus generated samples.

Minimax Game of GANs

At the heart of GANs lies the minimax game, a concept from game theory. In this setup:

The generator $G$ and discriminator $D$ are competing players;
$D$ aims to maximize its ability to distinguish real from generated data;
$G$ aims to minimize ability of $D$ to detect its fake data.

This dynamic defines a zero-sum game, where one player's gain is the other's loss. The optimization is defined as:

\underset{G}{\min} \, \underset{D}{\max} \, V(D, G) = \mathbb{E}_{\mathbf{x} \sim p_{x}}[\log D(\mathbf{x})] + \mathbb{E}_{\mathbf{z} \sim p_z}[\log(1 - D(G(\mathbf{z})))]

The generator tries to fool the discriminator by generating samples $$G(z) that are as close to real data as possible.

Loss Functions

While the original GAN objective defines a minimax game, in practice, alternative loss functions are used to stabilize training.

Non-saturating Generator Loss:

L_G=-\mathbb{E}_{z \sim p_z}[\log{D(G(z))}]

This helps the generator receive strong gradients even when the discriminator performs well.

Discriminator Loss:

L_D = -\mathbb{E}_{x \sim p_x}[\log{D(x)}] - \mathbb{E}_{z \sim p_z}[\log{(1-D(G(z)))}]

These losses encourage the generator to produce samples that increase the discriminator’s uncertainty and improve convergence during training.

Key Variants of GAN Architectures

Several types of GANs have emerged to tackle specific limitations or to improve performance:

Conditional GAN (cGAN)

Conditional GANs extend the standard GAN framework by introducing additional information (usually labels) into both the generator and the discriminator. Instead of generating data from random noise alone, the generator receives both noise $z$ and a condition $y$ (e.g., a class label). The discriminator also receives $y$ to judge if the sample is realistic under that condition.

Use cases: class-conditional image generation, image-to-image translation text-to-image generation.

Deep Convolutional GAN (DCGAN)

DCGANs replace the fully connected layers in the original GANs with convolutional and transposed convolutional layers, making them more effective for generating images. They also introduce architectural guidelines like removing fully connected layers, using batch normalization, and employing ReLU/LeakyReLU activations.

Use cases: photo-realistic image generation, learning visual representations, unsupervised feature learning.

CycleGAN CycleGANs address the problem of unpaired image-to-image translation. Unlike other models that require paired datasets (e.g., the same photo in two different styles), CycleGANs can learn mappings between two domains without paired examples. They introduce two generators and two discriminators, each responsible for mapping in one direction (e.g., photos to paintings and vice versa), and enforce a cycle-consistency loss to ensure that translating from one domain and back returns the original image. This loss is key to preserving content and structure.

Cycle-Consistency Loss ensures:

G_{BA}(G_{AB}(x)) \approx x\ \text{and}\ G_{AB}(G_{BA}(y))\approx y

where:

$G_{AB}$ maps images from domain A to domain B;
$G_{BA}$ maps from domain B to domain A.
$x \in A, y \in B$ .

Use Cases: photo to artwork conversion, horse-to-zebra translation, voice conversion between speakers.

StyleGAN

StyleGAN, developed by NVIDIA, introduces style-based control into the generator. Instead of feeding a noise vector directly to the generator, it passes through a mapping network to produce "style vectors" that influence each layer of the generator. This enables fine control over visual features such as hair color, facial expressions, or lighting.

Notable innovations:

Style mixing, allows combining multiple latent codes;
Adaptive Instance Normalization (AdaIN), controls feature maps in the generator;
Progressive growing, training starts at low resolution and increases over time.

Use cases: ultra high-resolution image generation (e.g., faces), visual attribute control, art generation.

Comparison: GANs vs VAEs

GANs are a powerful class of generative models capable of producing highly realistic data through an adversarial training process. Their core lies in a minimax game between two networks, using adversarial losses to iteratively improve both components. A solid grasp of their architecture, loss functions—including variants like cGAN, DCGAN, CycleGAN, and StyleGAN—and their contrast with other models like VAEs equips practitioners with the necessary foundation for applications in fields such as image generation, video synthesis, data augmentation, and more.

1. Which of the following best describes the components of a basic GAN architecture?

2. What is the goal of the minimax game in GANs?

3. Which of the following statements is true about the difference between GANs and VAEs?

Which of the following best describes the components of a basic GAN architecture?

Select the correct answer

An encoder and a decoder trained independently.

A generator that encodes real data and a discriminator that reconstructs it.

A generator that creates data and a discriminator that evaluates its realism.

A variational encoder and a classification discriminator.

What is the goal of the minimax game in GANs?

Select the correct answer

Minimize reconstruction error between input and output.

Train both generator and discriminator to maximize the same loss.

The generator minimizes KL divergence while the discriminator maximizes accuracy.

The generator tries to fool the discriminator while the discriminator tries to detect fakes.

Which of the following statements is true about the difference between GANs and VAEs?

Select the correct answer

GANs explicitly model data likelihood, VAEs do not.

GANs generate sharper outputs, but training is often less stable than VAEs.

VAEs use adversarial training to produce data.

Both models rely on discriminator feedback to update the generator.

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 2. Розділ 7

Запитати АІ

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Suggested prompts:

Запитайте мені питання про цей предмет

Сумаризуйте цей розділ

Покажіть реальні приклади

Awesome!

Completion rate improved to 4.55

Generative Adversarial Networks (GANs)

Свайпніть щоб показати меню

Architecture of a GAN

A basic GAN model consists of two core components:

1. Generator (G)

Takes a random noise vector $z \sim p_z(z)$ as input;
Transforms it through a neural network into a data sample $G(z)$ intended to resemble data from the true distribution.

2. Discriminator (D)

Takes either a real data sample $x \sim p_x(x)$ or a generated sample $G(z)$ ;
Outputs a scalar between 0 and 1, estimating the probability that the input is real.

Minimax Game of GANs

At the heart of GANs lies the minimax game, a concept from game theory. In this setup:

The generator $G$ and discriminator $D$ are competing players;
$D$ aims to maximize its ability to distinguish real from generated data;
$G$ aims to minimize ability of $D$ to detect its fake data.

This dynamic defines a zero-sum game, where one player's gain is the other's loss. The optimization is defined as:

\underset{G}{\min} \, \underset{D}{\max} \, V(D, G) = \mathbb{E}_{\mathbf{x} \sim p_{x}}[\log D(\mathbf{x})] + \mathbb{E}_{\mathbf{z} \sim p_z}[\log(1 - D(G(\mathbf{z})))]

The generator tries to fool the discriminator by generating samples $$G(z) that are as close to real data as possible.

Loss Functions

While the original GAN objective defines a minimax game, in practice, alternative loss functions are used to stabilize training.

Non-saturating Generator Loss:

L_G=-\mathbb{E}_{z \sim p_z}[\log{D(G(z))}]

This helps the generator receive strong gradients even when the discriminator performs well.

Discriminator Loss:

L_D = -\mathbb{E}_{x \sim p_x}[\log{D(x)}] - \mathbb{E}_{z \sim p_z}[\log{(1-D(G(z)))}]

These losses encourage the generator to produce samples that increase the discriminator’s uncertainty and improve convergence during training.

Key Variants of GAN Architectures

Several types of GANs have emerged to tackle specific limitations or to improve performance:

Conditional GAN (cGAN)

Use cases: class-conditional image generation, image-to-image translation text-to-image generation.

Deep Convolutional GAN (DCGAN)

Use cases: photo-realistic image generation, learning visual representations, unsupervised feature learning.

Cycle-Consistency Loss ensures:

G_{BA}(G_{AB}(x)) \approx x\ \text{and}\ G_{AB}(G_{BA}(y))\approx y

where:

$G_{AB}$ maps images from domain A to domain B;
$G_{BA}$ maps from domain B to domain A.
$x \in A, y \in B$ .

Use Cases: photo to artwork conversion, horse-to-zebra translation, voice conversion between speakers.

StyleGAN

Notable innovations:

Style mixing, allows combining multiple latent codes;
Adaptive Instance Normalization (AdaIN), controls feature maps in the generator;
Progressive growing, training starts at low resolution and increases over time.

Use cases: ultra high-resolution image generation (e.g., faces), visual attribute control, art generation.

Comparison: GANs vs VAEs

1. Which of the following best describes the components of a basic GAN architecture?

2. What is the goal of the minimax game in GANs?

3. Which of the following statements is true about the difference between GANs and VAEs?

Which of the following best describes the components of a basic GAN architecture?

Select the correct answer

An encoder and a decoder trained independently.

A generator that encodes real data and a discriminator that reconstructs it.

A generator that creates data and a discriminator that evaluates its realism.

A variational encoder and a classification discriminator.

What is the goal of the minimax game in GANs?

Select the correct answer

Minimize reconstruction error between input and output.

Train both generator and discriminator to maximize the same loss.

The generator minimizes KL divergence while the discriminator maximizes accuracy.

The generator tries to fool the discriminator while the discriminator tries to detect fakes.

Which of the following statements is true about the difference between GANs and VAEs?

Select the correct answer

GANs explicitly model data likelihood, VAEs do not.

GANs generate sharper outputs, but training is often less stable than VAEs.

VAEs use adversarial training to produce data.

Both models rely on discriminator feedback to update the generator.

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 2. Розділ 7