Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Impara Jensen–Shannon Divergence and GANs | Advanced and Specialized Losses
Quizzes & Challenges
Quizzes
Challenges
/
Loss Functions in Machine Learning

bookJensen–Shannon Divergence and GANs

The Jensen–Shannon (JS) divergence is a fundamental concept in information theory and machine learning, especially in the context of generative adversarial networks (GANs). It builds upon the Kullback–Leibler (KL) divergence by introducing a symmetrized and smoothed measure of the difference between two probability distributions. The formula for the Jensen–Shannon divergence is:

JS(PQ)=12DKL(PM)+12DKL(QM)JS(P \| Q) = \frac{1}{2} D_{KL}(P \| M) + \frac{1}{2} D_{KL}(Q \| M)

where M=12(P+Q)M = \frac{1}{2}(P+Q), and DKLD_{KL} denotes the KL divergence.

12345678910111213141516171819202122232425
import numpy as np import seaborn as sns import matplotlib.pyplot as plt rng = np.random.default_rng(0) # Synthetic distributions: # P = Gaussian centered at 0 # Q = Gaussian centered at 2 P_samples = rng.normal(loc=0, scale=1.0, size=2000) Q_samples = rng.normal(loc=2, scale=1.0, size=2000) # Mixture distribution M = 0.5 * (P + Q) M_samples = np.concatenate([P_samples/2, Q_samples/2]) plt.figure(figsize=(10,5)) sns.kdeplot(P_samples, fill=True, color="#1f77b4", alpha=0.6, label="P (Real Distribution)") sns.kdeplot(Q_samples, fill=True, color="#ff7f0e", alpha=0.6, label="Q (Generated Distribution)") sns.kdeplot(M_samples, fill=True, color="#2ca02c", alpha=0.5, label="M = 0.5(P+Q)") plt.title("JS Divergence: Comparing P, Q, and the Mixture M") plt.xlabel("Value") plt.ylabel("Density") plt.legend() plt.show()
copy

This looks extremely intuitive in a GAN context — real distribution vs generated distribution vs mixture.

Note
Note

Jensen–Shannon divergence is symmetric and bounded, making it suitable for adversarial learning.

Symmetry means that JS(PQ)=JS(QP)JS(P \| Q) = JS(Q \| P), unlike KL divergence, which is asymmetric. The JS divergence is also always bounded between 0 and log2\log 2 (or 1, depending on the logarithm base), making it numerically stable and easier to interpret in optimization problems.

1234567891011121314151617181920
# Discretize distributions into bins to approximate JS bins = np.linspace(-5, 7, 300) P_hist, _ = np.histogram(P_samples, bins=bins, density=True) Q_hist, _ = np.histogram(Q_samples, bins=bins, density=True) M_hist, _ = np.histogram(M_samples, bins=bins, density=True) eps = 1e-12 P_hist += eps Q_hist += eps M_hist += eps KL_PM = np.sum(P_hist * np.log(P_hist / M_hist)) KL_QM = np.sum(Q_hist * np.log(Q_hist / M_hist)) JS = 0.5 * KL_PM + 0.5 * KL_QM print("KL(P || M):", KL_PM) print("KL(Q || M):", KL_QM) print("JS(P || Q):", JS)
copy

In the context of GANs, the generator and discriminator are engaged in a minimax game: the generator tries to produce samples that are indistinguishable from the real data, while the discriminator tries to tell them apart. The original GAN formulation uses the JS divergence as the theoretical basis for its loss function. This choice is motivated by the fact that JS divergence provides a meaningful, symmetric measure of the overlap between the generated and real data distributions. If the distributions do not overlap, the JS divergence reaches its maximum, signaling the generator to produce more realistic samples. Conversely, as the generator improves, the divergence decreases, guiding the model towards better performance.

However, the bounded and symmetric nature of JS divergence can also lead to unique training dynamics. When the real and generated data distributions have little or no overlap, the gradient provided by the JS divergence can vanish, making it difficult for the generator to improve. This phenomenon is known as the vanishing gradient problem in GANs. Despite this limitation, the JS divergence remains a foundational concept in understanding how GANs measure the distance between probability distributions and why certain loss functions are chosen for adversarial learning.

question mark

Which of the following best describes the difference between KL divergence and JS divergence, and why is JS divergence often preferred in GANs?

Select the correct answer

Tutto è chiaro?

Come possiamo migliorarlo?

Grazie per i tuoi commenti!

Sezione 4. Capitolo 2

Chieda ad AI

expand

Chieda ad AI

ChatGPT

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

Awesome!

Completion rate improved to 6.67

bookJensen–Shannon Divergence and GANs

Scorri per mostrare il menu

The Jensen–Shannon (JS) divergence is a fundamental concept in information theory and machine learning, especially in the context of generative adversarial networks (GANs). It builds upon the Kullback–Leibler (KL) divergence by introducing a symmetrized and smoothed measure of the difference between two probability distributions. The formula for the Jensen–Shannon divergence is:

JS(PQ)=12DKL(PM)+12DKL(QM)JS(P \| Q) = \frac{1}{2} D_{KL}(P \| M) + \frac{1}{2} D_{KL}(Q \| M)

where M=12(P+Q)M = \frac{1}{2}(P+Q), and DKLD_{KL} denotes the KL divergence.

12345678910111213141516171819202122232425
import numpy as np import seaborn as sns import matplotlib.pyplot as plt rng = np.random.default_rng(0) # Synthetic distributions: # P = Gaussian centered at 0 # Q = Gaussian centered at 2 P_samples = rng.normal(loc=0, scale=1.0, size=2000) Q_samples = rng.normal(loc=2, scale=1.0, size=2000) # Mixture distribution M = 0.5 * (P + Q) M_samples = np.concatenate([P_samples/2, Q_samples/2]) plt.figure(figsize=(10,5)) sns.kdeplot(P_samples, fill=True, color="#1f77b4", alpha=0.6, label="P (Real Distribution)") sns.kdeplot(Q_samples, fill=True, color="#ff7f0e", alpha=0.6, label="Q (Generated Distribution)") sns.kdeplot(M_samples, fill=True, color="#2ca02c", alpha=0.5, label="M = 0.5(P+Q)") plt.title("JS Divergence: Comparing P, Q, and the Mixture M") plt.xlabel("Value") plt.ylabel("Density") plt.legend() plt.show()
copy

This looks extremely intuitive in a GAN context — real distribution vs generated distribution vs mixture.

Note
Note

Jensen–Shannon divergence is symmetric and bounded, making it suitable for adversarial learning.

Symmetry means that JS(PQ)=JS(QP)JS(P \| Q) = JS(Q \| P), unlike KL divergence, which is asymmetric. The JS divergence is also always bounded between 0 and log2\log 2 (or 1, depending on the logarithm base), making it numerically stable and easier to interpret in optimization problems.

1234567891011121314151617181920
# Discretize distributions into bins to approximate JS bins = np.linspace(-5, 7, 300) P_hist, _ = np.histogram(P_samples, bins=bins, density=True) Q_hist, _ = np.histogram(Q_samples, bins=bins, density=True) M_hist, _ = np.histogram(M_samples, bins=bins, density=True) eps = 1e-12 P_hist += eps Q_hist += eps M_hist += eps KL_PM = np.sum(P_hist * np.log(P_hist / M_hist)) KL_QM = np.sum(Q_hist * np.log(Q_hist / M_hist)) JS = 0.5 * KL_PM + 0.5 * KL_QM print("KL(P || M):", KL_PM) print("KL(Q || M):", KL_QM) print("JS(P || Q):", JS)
copy

In the context of GANs, the generator and discriminator are engaged in a minimax game: the generator tries to produce samples that are indistinguishable from the real data, while the discriminator tries to tell them apart. The original GAN formulation uses the JS divergence as the theoretical basis for its loss function. This choice is motivated by the fact that JS divergence provides a meaningful, symmetric measure of the overlap between the generated and real data distributions. If the distributions do not overlap, the JS divergence reaches its maximum, signaling the generator to produce more realistic samples. Conversely, as the generator improves, the divergence decreases, guiding the model towards better performance.

However, the bounded and symmetric nature of JS divergence can also lead to unique training dynamics. When the real and generated data distributions have little or no overlap, the gradient provided by the JS divergence can vanish, making it difficult for the generator to improve. This phenomenon is known as the vanishing gradient problem in GANs. Despite this limitation, the JS divergence remains a foundational concept in understanding how GANs measure the distance between probability distributions and why certain loss functions are chosen for adversarial learning.

question mark

Which of the following best describes the difference between KL divergence and JS divergence, and why is JS divergence often preferred in GANs?

Select the correct answer

Tutto è chiaro?

Come possiamo migliorarlo?

Grazie per i tuoi commenti!

Sezione 4. Capitolo 2
some-alt