Wasserstein GANs (WGAN)
Understanding how to measure the distance between two probability distributions is central to the success of GANs. The original GAN formulation uses the Jensen-Shannon (JS) divergence, but this can cause problems when the distributions do not overlap, leading to vanishing gradients and unstable training. Wasserstein GANs (WGANs) address these issues by introducing the Wasserstein distance (also called Earth Mover's distance) as a new way to quantify how different the generated data distribution is from the real data distribution. The Wasserstein distance has several advantages:
- It provides meaningful gradients even when the two distributions have no overlap;
- It leads to more stable and robust GAN training.
Wasserstein loss is the loss function in WGANs is based on the Wasserstein distance, which measures the minimum cost of transporting mass to transform one probability distribution into another.
Lipschitz constraint: to compute the Wasserstein distance, the discriminator (called the critic in WGANs) must be a 1-Lipschitz function. This is typically enforced by weight clipping or other regularization techniques.
The mathematical formulation of the Wasserstein distance between the real data distribution Pr and the generated data distribution Pg is:
W(Pr,Pg)=γ∈Π(Pr,Pg)infE(x,y)∼γ[∥x−y∥]Here, Π(Pr,Pg) denotes the set of all joint distributions γ(x,y) whose marginals are Pr and Pg, and ∥x−y∥ is the cost of transporting a unit of probability mass from x to y. This formulation captures the idea of the minimum effort required to transform one distribution into another, making it a powerful tool for training GANs.
Conceptually, WGAN modifies the GAN training process in several key ways. Instead of using a discriminator that outputs probabilities, WGAN uses a critic that scores real and generated samples. The critic is trained to maximize the difference between its average output on real samples and its average output on generated samples. This difference approximates the Wasserstein distance between the two distributions. To ensure the critic is a 1-Lipschitz function, its weights are clipped to a small range after each gradient update. As a result, the generator is trained to minimize the Wasserstein distance, leading to more stable gradients and improved training dynamics compared to the original GAN framework.
Дякуємо за ваш відгук!
Запитати АІ
Запитати АІ
Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат
Awesome!
Completion rate improved to 8.33
Wasserstein GANs (WGAN)
Свайпніть щоб показати меню
Understanding how to measure the distance between two probability distributions is central to the success of GANs. The original GAN formulation uses the Jensen-Shannon (JS) divergence, but this can cause problems when the distributions do not overlap, leading to vanishing gradients and unstable training. Wasserstein GANs (WGANs) address these issues by introducing the Wasserstein distance (also called Earth Mover's distance) as a new way to quantify how different the generated data distribution is from the real data distribution. The Wasserstein distance has several advantages:
- It provides meaningful gradients even when the two distributions have no overlap;
- It leads to more stable and robust GAN training.
Wasserstein loss is the loss function in WGANs is based on the Wasserstein distance, which measures the minimum cost of transporting mass to transform one probability distribution into another.
Lipschitz constraint: to compute the Wasserstein distance, the discriminator (called the critic in WGANs) must be a 1-Lipschitz function. This is typically enforced by weight clipping or other regularization techniques.
The mathematical formulation of the Wasserstein distance between the real data distribution Pr and the generated data distribution Pg is:
W(Pr,Pg)=γ∈Π(Pr,Pg)infE(x,y)∼γ[∥x−y∥]Here, Π(Pr,Pg) denotes the set of all joint distributions γ(x,y) whose marginals are Pr and Pg, and ∥x−y∥ is the cost of transporting a unit of probability mass from x to y. This formulation captures the idea of the minimum effort required to transform one distribution into another, making it a powerful tool for training GANs.
Conceptually, WGAN modifies the GAN training process in several key ways. Instead of using a discriminator that outputs probabilities, WGAN uses a critic that scores real and generated samples. The critic is trained to maximize the difference between its average output on real samples and its average output on generated samples. This difference approximates the Wasserstein distance between the two distributions. To ensure the critic is a 1-Lipschitz function, its weights are clipped to a small range after each gradient update. As a result, the generator is trained to minimize the Wasserstein distance, leading to more stable gradients and improved training dynamics compared to the original GAN framework.
Дякуємо за ваш відгук!