Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele Wasserstein GANs (WGAN) | Variants, Applications, and Limitations of GANs
Generative Adversarial Networks Basics

bookWasserstein GANs (WGAN)

Understanding how to measure the distance between two probability distributions is central to the success of GANs. The original GAN formulation uses the Jensen-Shannon (JS) divergence, but this can cause problems when the distributions do not overlap, leading to vanishing gradients and unstable training. Wasserstein GANs (WGANs) address these issues by introducing the Wasserstein distance (also called Earth Mover's distance) as a new way to quantify how different the generated data distribution is from the real data distribution. The Wasserstein distance has several advantages:

  • It provides meaningful gradients even when the two distributions have no overlap;
  • It leads to more stable and robust GAN training.
Note
Definition

Wasserstein loss is the loss function in WGANs is based on the Wasserstein distance, which measures the minimum cost of transporting mass to transform one probability distribution into another.

Note
Definition

Lipschitz constraint: to compute the Wasserstein distance, the discriminator (called the critic in WGANs) must be a 1-Lipschitz function. This is typically enforced by weight clipping or other regularization techniques.

The mathematical formulation of the Wasserstein distance between the real data distribution PrP_r and the generated data distribution PgP_g is:

W(Pr,Pg)=infγΠ(Pr,Pg)E(x,y)γ[xy]W(P_r, P_g) = \inf_{\gamma \in \Pi(P_r, P_g)} \mathbb{E}_{(x, y) \sim \gamma}[\|x - y\|]

Here, Π(Pr,Pg)\Pi(P_r, P_g) denotes the set of all joint distributions γ(x,y)\gamma(x, y) whose marginals are PrP_r and PgP_g, and xy\|x - y\| is the cost of transporting a unit of probability mass from xx to yy. This formulation captures the idea of the minimum effort required to transform one distribution into another, making it a powerful tool for training GANs.

Conceptually, WGAN modifies the GAN training process in several key ways. Instead of using a discriminator that outputs probabilities, WGAN uses a critic that scores real and generated samples. The critic is trained to maximize the difference between its average output on real samples and its average output on generated samples. This difference approximates the Wasserstein distance between the two distributions. To ensure the critic is a 1-Lipschitz function, its weights are clipped to a small range after each gradient update. As a result, the generator is trained to minimize the Wasserstein distance, leading to more stable gradients and improved training dynamics compared to the original GAN framework.

question mark

Which of the following statements best captures the core idea behind WGANs?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 3. Luku 3

Kysy tekoälyä

expand

Kysy tekoälyä

ChatGPT

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

Suggested prompts:

Can you explain why the JS divergence causes vanishing gradients in GANs?

How does the Wasserstein distance improve GAN training stability?

What does it mean for the critic to be 1-Lipschitz, and why is weight clipping used?

Awesome!

Completion rate improved to 8.33

bookWasserstein GANs (WGAN)

Pyyhkäise näyttääksesi valikon

Understanding how to measure the distance between two probability distributions is central to the success of GANs. The original GAN formulation uses the Jensen-Shannon (JS) divergence, but this can cause problems when the distributions do not overlap, leading to vanishing gradients and unstable training. Wasserstein GANs (WGANs) address these issues by introducing the Wasserstein distance (also called Earth Mover's distance) as a new way to quantify how different the generated data distribution is from the real data distribution. The Wasserstein distance has several advantages:

  • It provides meaningful gradients even when the two distributions have no overlap;
  • It leads to more stable and robust GAN training.
Note
Definition

Wasserstein loss is the loss function in WGANs is based on the Wasserstein distance, which measures the minimum cost of transporting mass to transform one probability distribution into another.

Note
Definition

Lipschitz constraint: to compute the Wasserstein distance, the discriminator (called the critic in WGANs) must be a 1-Lipschitz function. This is typically enforced by weight clipping or other regularization techniques.

The mathematical formulation of the Wasserstein distance between the real data distribution PrP_r and the generated data distribution PgP_g is:

W(Pr,Pg)=infγΠ(Pr,Pg)E(x,y)γ[xy]W(P_r, P_g) = \inf_{\gamma \in \Pi(P_r, P_g)} \mathbb{E}_{(x, y) \sim \gamma}[\|x - y\|]

Here, Π(Pr,Pg)\Pi(P_r, P_g) denotes the set of all joint distributions γ(x,y)\gamma(x, y) whose marginals are PrP_r and PgP_g, and xy\|x - y\| is the cost of transporting a unit of probability mass from xx to yy. This formulation captures the idea of the minimum effort required to transform one distribution into another, making it a powerful tool for training GANs.

Conceptually, WGAN modifies the GAN training process in several key ways. Instead of using a discriminator that outputs probabilities, WGAN uses a critic that scores real and generated samples. The critic is trained to maximize the difference between its average output on real samples and its average output on generated samples. This difference approximates the Wasserstein distance between the two distributions. To ensure the critic is a 1-Lipschitz function, its weights are clipped to a small range after each gradient update. As a result, the generator is trained to minimize the Wasserstein distance, leading to more stable gradients and improved training dynamics compared to the original GAN framework.

question mark

Which of the following statements best captures the core idea behind WGANs?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 3. Luku 3
some-alt