Score Matching, DSM, and their connection to Diffusion
Score matching is a fundamental technique for learning generative models by estimating the gradient of the log-density — known as the score function — of a data distribution. Instead of directly modeling the probability density, score matching seeks to find a function that closely approximates the gradient of the log-probability with respect to the data. This approach is particularly valuable when the normalizing constant of the distribution is intractable, as is often the case in high-dimensional generative modeling.
Denoising score matching (DSM) builds on this idea by introducing controlled noise into the data and training a model to predict the score of the noisy data distribution. DSM leverages the insight that learning to undo noise is closely related to learning the score function itself. This makes DSM especially well suited for diffusion models, where the generative process is defined by gradually adding noise to data and then learning to reverse this process.
To see how DSM connects to diffusion models, consider the mathematical formulation. In DSM, you first corrupt data samples x0 with Gaussian noise to obtain noisy samples x~:
x~=x0+σ∗εwhere ε N(0,I) and σ is the noise standard deviation. The goal is to train a neural network sθ(x~,σ) to estimate the score function of the noisy data, that is, the gradient of the log-probability density with respect to x~:
∇x~log(qσ(x~))The DSM loss function encourages the model to predict the true score by minimizing the expected squared error between the model output and the true score:
LDSM(θ)=Ex0,ε[∣∣sθ(x~,σ)−∇x~logqσ(x~)∣∣2]However, the true score ∇x~log(qσ(x~)) is generally unknown. Fortunately, for Gaussian noise, this term can be rewritten using properties of the Gaussian distribution. In particular, it can be shown that:
∇x~logqσ(x~∣x0)=−(x~−x0)/σ2This leads to a practical DSM loss:
LDSM(θ)=Ex0,ε[∣∣sθ(x~,σ)+(x~−x0)/σ2∣∣2]This loss closely resembles the training objective in diffusion models, where the model is trained to predict either the noise or the original data from a noisy sample. In fact, the diffusion model's training objective can be interpreted as a form of denoising score matching, where the model learns to estimate the score of the data at various noise levels, corresponding to different time steps in the diffusion process.
To make this connection more concrete, imagine you have a dataset of images. You add Gaussian noise to each image to create a noisy version. The task is to train a model that, given the noisy image, can predict the direction in which the original image lies—that is, the gradient pointing back to the clean image. This direction is precisely the score function for the noisy data distribution. By learning to estimate this score, the model gains the ability to denoise, which is the core mechanism behind the reverse diffusion process. In diffusion models, this learned score function guides the generation of new samples by iteratively moving noisy data towards high-probability regions of the data distribution, effectively reconstructing realistic images from pure noise.
¡Gracias por tus comentarios!
Pregunte a AI
Pregunte a AI
Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla
Can you explain how score matching differs from maximum likelihood estimation?
How does denoising score matching improve the training of diffusion models?
Can you provide an example of how the DSM loss is implemented in practice?
Awesome!
Completion rate improved to 8.33
Score Matching, DSM, and their connection to Diffusion
Desliza para mostrar el menú
Score matching is a fundamental technique for learning generative models by estimating the gradient of the log-density — known as the score function — of a data distribution. Instead of directly modeling the probability density, score matching seeks to find a function that closely approximates the gradient of the log-probability with respect to the data. This approach is particularly valuable when the normalizing constant of the distribution is intractable, as is often the case in high-dimensional generative modeling.
Denoising score matching (DSM) builds on this idea by introducing controlled noise into the data and training a model to predict the score of the noisy data distribution. DSM leverages the insight that learning to undo noise is closely related to learning the score function itself. This makes DSM especially well suited for diffusion models, where the generative process is defined by gradually adding noise to data and then learning to reverse this process.
To see how DSM connects to diffusion models, consider the mathematical formulation. In DSM, you first corrupt data samples x0 with Gaussian noise to obtain noisy samples x~:
x~=x0+σ∗εwhere ε N(0,I) and σ is the noise standard deviation. The goal is to train a neural network sθ(x~,σ) to estimate the score function of the noisy data, that is, the gradient of the log-probability density with respect to x~:
∇x~log(qσ(x~))The DSM loss function encourages the model to predict the true score by minimizing the expected squared error between the model output and the true score:
LDSM(θ)=Ex0,ε[∣∣sθ(x~,σ)−∇x~logqσ(x~)∣∣2]However, the true score ∇x~log(qσ(x~)) is generally unknown. Fortunately, for Gaussian noise, this term can be rewritten using properties of the Gaussian distribution. In particular, it can be shown that:
∇x~logqσ(x~∣x0)=−(x~−x0)/σ2This leads to a practical DSM loss:
LDSM(θ)=Ex0,ε[∣∣sθ(x~,σ)+(x~−x0)/σ2∣∣2]This loss closely resembles the training objective in diffusion models, where the model is trained to predict either the noise or the original data from a noisy sample. In fact, the diffusion model's training objective can be interpreted as a form of denoising score matching, where the model learns to estimate the score of the data at various noise levels, corresponding to different time steps in the diffusion process.
To make this connection more concrete, imagine you have a dataset of images. You add Gaussian noise to each image to create a noisy version. The task is to train a model that, given the noisy image, can predict the direction in which the original image lies—that is, the gradient pointing back to the clean image. This direction is precisely the score function for the noisy data distribution. By learning to estimate this score, the model gains the ability to denoise, which is the core mechanism behind the reverse diffusion process. In diffusion models, this learned score function guides the generation of new samples by iteratively moving noisy data towards high-probability regions of the data distribution, effectively reconstructing realistic images from pure noise.
¡Gracias por tus comentarios!