Зміст курсу
Generative AI
Generative AI
Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs) are a class of generative models introduced by Ian Goodfellow in 2014. They consist of two neural networks — the Generator and the Discriminator — trained simultaneously in a game-theoretic framework. The generator tries to produce data that resembles the real data, while the discriminator tries to distinguish real data from the generated data.
GANs learn to generate data samples from noise by solving a minimax game. Over the course of training, the generator becomes better at producing realistic data, and the discriminator becomes better at distinguishing real from fake data.
Architecture of a GAN
A basic GAN model consists of two core components:
1. Generator (G)
- Takes a random noise vector as input;
- Transforms it through a neural network into a data sample intended to resemble data from the true distribution.
2. Discriminator (D)
- Takes either a real data sample or a generated sample ;
- Outputs a scalar between 0 and 1, estimating the probability that the input is real.
These two components are trained simultaneously. The generator aims to produce realistic samples to fool the discriminator, while the discriminator aims to correctly identify real versus generated samples.
Minimax Game of GANs
At the heart of GANs lies the minimax game, a concept from game theory. In this setup:
- The generator and discriminator are competing players;
- aims to maximize its ability to distinguish real from generated data;
- aims to minimize ability of to detect its fake data.
This dynamic defines a zero-sum game, where one player's gain is the other's loss. The optimization is defined as:
The generator tries to fool the discriminator by generating samples $$G(z) that are as close to real data as possible.
Loss Functions
While the original GAN objective defines a minimax game, in practice, alternative loss functions are used to stabilize training.
- Non-saturating Generator Loss:
This helps the generator receive strong gradients even when the discriminator performs well.
- Discriminator Loss:
These losses encourage the generator to produce samples that increase the discriminator’s uncertainty and improve convergence during training.
Key Variants of GAN Architectures
Several types of GANs have emerged to tackle specific limitations or to improve performance:
Conditional GAN (cGAN)
Conditional GANs extend the standard GAN framework by introducing additional information (usually labels) into both the generator and the discriminator. Instead of generating data from random noise alone, the generator receives both noise and a condition (e.g., a class label). The discriminator also receives to judge if the sample is realistic under that condition.
- Use cases: class-conditional image generation, image-to-image translation text-to-image generation.
Deep Convolutional GAN (DCGAN)
DCGANs replace the fully connected layers in the original GANs with convolutional and transposed convolutional layers, making them more effective for generating images. They also introduce architectural guidelines like removing fully connected layers, using batch normalization, and employing ReLU/LeakyReLU activations.
- Use cases: photo-realistic image generation, learning visual representations, unsupervised feature learning.
CycleGAN CycleGANs address the problem of unpaired image-to-image translation. Unlike other models that require paired datasets (e.g., the same photo in two different styles), CycleGANs can learn mappings between two domains without paired examples. They introduce two generators and two discriminators, each responsible for mapping in one direction (e.g., photos to paintings and vice versa), and enforce a cycle-consistency loss to ensure that translating from one domain and back returns the original image. This loss is key to preserving content and structure.
Cycle-Consistency Loss ensures:
where:
- maps images from domain A to domain B;
- maps from domain B to domain A.
- .
Use Cases: photo to artwork conversion, horse-to-zebra translation, voice conversion between speakers.
StyleGAN
StyleGAN, developed by NVIDIA, introduces style-based control into the generator. Instead of feeding a noise vector directly to the generator, it passes through a mapping network to produce "style vectors" that influence each layer of the generator. This enables fine control over visual features such as hair color, facial expressions, or lighting.
Notable innovations:
- Style mixing, allows combining multiple latent codes;
- Adaptive Instance Normalization (AdaIN), controls feature maps in the generator;
- Progressive growing, training starts at low resolution and increases over time.
Use cases: ultra high-resolution image generation (e.g., faces), visual attribute control, art generation.
Comparison: GANs vs VAEs
GANs are a powerful class of generative models capable of producing highly realistic data through an adversarial training process. Their core lies in a minimax game between two networks, using adversarial losses to iteratively improve both components. A solid grasp of their architecture, loss functions—including variants like cGAN, DCGAN, CycleGAN, and StyleGAN—and their contrast with other models like VAEs equips practitioners with the necessary foundation for applications in fields such as image generation, video synthesis, data augmentation, and more.
1. Which of the following best describes the components of a basic GAN architecture?
2. What is the goal of the minimax game in GANs?
3. Which of the following statements is true about the difference between GANs and VAEs?
Дякуємо за ваш відгук!