Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Wasserstein GANs (WGAN) | Variants, Applications, and Limitations of GANs
Quizzes & Challenges
Quizzes
Challenges
/
Generative Adversarial Networks Basics

bookWasserstein GANs (WGAN)

Understanding how to measure the distance between two probability distributions is central to the success of GANs. The original GAN formulation uses the Jensen-Shannon (JS) divergence, but this can cause problems when the distributions do not overlap, leading to vanishing gradients and unstable training. Wasserstein GANs (WGANs) address these issues by introducing the Wasserstein distance (also called Earth Mover's distance) as a new way to quantify how different the generated data distribution is from the real data distribution. The Wasserstein distance has several advantages:

  • It provides meaningful gradients even when the two distributions have no overlap;
  • It leads to more stable and robust GAN training.
Note
Definition

Wasserstein loss is the loss function in WGANs is based on the Wasserstein distance, which measures the minimum cost of transporting mass to transform one probability distribution into another.

Note
Definition

Lipschitz constraint: to compute the Wasserstein distance, the discriminator (called the critic in WGANs) must be a 1-Lipschitz function. This is typically enforced by weight clipping or other regularization techniques.

The mathematical formulation of the Wasserstein distance between the real data distribution PrP_r and the generated data distribution PgP_g is:

W(Pr,Pg)=infγΠ(Pr,Pg)E(x,y)γ[xy]W(P_r, P_g) = \inf_{\gamma \in \Pi(P_r, P_g)} \mathbb{E}_{(x, y) \sim \gamma}[\|x - y\|]

Here, Π(Pr,Pg)\Pi(P_r, P_g) denotes the set of all joint distributions γ(x,y)\gamma(x, y) whose marginals are PrP_r and PgP_g, and xy\|x - y\| is the cost of transporting a unit of probability mass from xx to yy. This formulation captures the idea of the minimum effort required to transform one distribution into another, making it a powerful tool for training GANs.

Conceptually, WGAN modifies the GAN training process in several key ways. Instead of using a discriminator that outputs probabilities, WGAN uses a critic that scores real and generated samples. The critic is trained to maximize the difference between its average output on real samples and its average output on generated samples. This difference approximates the Wasserstein distance between the two distributions. To ensure the critic is a 1-Lipschitz function, its weights are clipped to a small range after each gradient update. As a result, the generator is trained to minimize the Wasserstein distance, leading to more stable gradients and improved training dynamics compared to the original GAN framework.

question mark

Which of the following statements best captures the core idea behind WGANs?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 3. Kapitel 3

Fragen Sie AI

expand

Fragen Sie AI

ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

Awesome!

Completion rate improved to 8.33

bookWasserstein GANs (WGAN)

Swipe um das Menü anzuzeigen

Understanding how to measure the distance between two probability distributions is central to the success of GANs. The original GAN formulation uses the Jensen-Shannon (JS) divergence, but this can cause problems when the distributions do not overlap, leading to vanishing gradients and unstable training. Wasserstein GANs (WGANs) address these issues by introducing the Wasserstein distance (also called Earth Mover's distance) as a new way to quantify how different the generated data distribution is from the real data distribution. The Wasserstein distance has several advantages:

  • It provides meaningful gradients even when the two distributions have no overlap;
  • It leads to more stable and robust GAN training.
Note
Definition

Wasserstein loss is the loss function in WGANs is based on the Wasserstein distance, which measures the minimum cost of transporting mass to transform one probability distribution into another.

Note
Definition

Lipschitz constraint: to compute the Wasserstein distance, the discriminator (called the critic in WGANs) must be a 1-Lipschitz function. This is typically enforced by weight clipping or other regularization techniques.

The mathematical formulation of the Wasserstein distance between the real data distribution PrP_r and the generated data distribution PgP_g is:

W(Pr,Pg)=infγΠ(Pr,Pg)E(x,y)γ[xy]W(P_r, P_g) = \inf_{\gamma \in \Pi(P_r, P_g)} \mathbb{E}_{(x, y) \sim \gamma}[\|x - y\|]

Here, Π(Pr,Pg)\Pi(P_r, P_g) denotes the set of all joint distributions γ(x,y)\gamma(x, y) whose marginals are PrP_r and PgP_g, and xy\|x - y\| is the cost of transporting a unit of probability mass from xx to yy. This formulation captures the idea of the minimum effort required to transform one distribution into another, making it a powerful tool for training GANs.

Conceptually, WGAN modifies the GAN training process in several key ways. Instead of using a discriminator that outputs probabilities, WGAN uses a critic that scores real and generated samples. The critic is trained to maximize the difference between its average output on real samples and its average output on generated samples. This difference approximates the Wasserstein distance between the two distributions. To ensure the critic is a 1-Lipschitz function, its weights are clipped to a small range after each gradient update. As a result, the generator is trained to minimize the Wasserstein distance, leading to more stable gradients and improved training dynamics compared to the original GAN framework.

question mark

Which of the following statements best captures the core idea behind WGANs?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 3. Kapitel 3
some-alt