Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprende Loss Function | Neural Network from Scratch
Introduction to Neural Networks
course content

Contenido del Curso

Introduction to Neural Networks

Introduction to Neural Networks

1. Concept of Neural Network
2. Neural Network from Scratch
3. Model Training and Evaluation
4. Conclusion

book
Loss Function

In training a neural network, we need a way to measure how well our model is performing. This is done using a loss function, which quantifies the difference between the predicted outputs and the actual target values. The goal of training is to minimize this loss, making our predictions as close to the actual values as possible.

One of the most commonly used loss functions for binary classification is the cross-entropy loss, which works well with models that output probabilities.

Derivation of Cross-Entropy Loss

To understand cross-entropy loss, we start with the maximum likelihood principle. In a binary classification problem, the goal is to train a model that estimates the probability that a given input belongs to class 1. The actual label y can be either 0 or 1.

A good model should maximize the probability of correctly predicting all training examples. This means we want to maximize the likelihood function, which represents the probability of seeing the observed data given the model's predictions.

For a single training example, assuming independence, the likelihood can be written as:

This expression simply means:

  • If y = 1, then P(y|x) = ŷ, meaning we want to maximize ŷ (the probability assigned to class 1);
  • If y = 0, then P(y|x) = 1 - ŷ, meaning we want to maximize 1 - ŷ (the probability assigned to class 0).

To make optimization easier, we take the log-likelihood instead of the likelihood itself (since logarithms turn products into sums, making differentiation simpler):

Since the goal is maximization, we define the loss function as the negative log-likelihood, which we want to minimize:

This is the binary cross-entropy loss function, commonly used for classification problems.

Why This Formula?

Cross-entropy loss has a clear intuitive interpretation:

  • If y = 1, the loss simplifies to -log(ŷ), meaning the loss is low when ŷ is close to 1 and very high when ŷ is close to 0;
  • If y = 0, the loss simplifies to -log(1 - ŷ), meaning the loss is low when ŷ is close to 0 and very high when it is close to 1.

Since logarithms grow negatively large as their input approaches zero, incorrect predictions are heavily penalized, encouraging the model to make confident, correct predictions.

If multiple examples are passed during forward propagation, the total loss is computed as the average loss across all examples:

where N is the number of training samples.

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 2. Capítulo 6
We're sorry to hear that something went wrong. What happened?
some-alt