Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Finding the Parameters | Logistic Regression
Classification with Python

bookFinding the Parameters

Logistic Regression only requires from computer to learn the best parameters Ξ²Ξ². For that, we need to define what "best parameters" means. Let's recall how the model works, it predicts the pp - probability of belonging to class 1:

p=Οƒ(z)=Οƒ(Ξ²0+Ξ²1x1+...)p = \sigma (z) = \sigma (\beta_0 + \beta_1x_1 + ...)

Where

Οƒ(z)=11+eβˆ’z\sigma (z) = \frac{1}{1 + e^{-z}}

Obviously, the model with good parameters is the one predicting high (close to 1) pp for instances that are actually of class 1 and low (close to 0) pp for instances with the actual class 0.

To measure how bad or how good the model is, we use a cost function. In linear regression, we used MSE (mean squared error) as a cost function. This time, a different function is used:

Here pp represents the probability of belonging to class 1, as predicted by the model, while yy denotes the actual target value.

This function not only penalizes incorrect predictions but also considers the model's confidence in its predictions. As illustrated in the image above, when the value of pp closely matches yy (the actual target), the cost function remains relatively small, indicating that the model confidently selected the correct class. Conversely, if the prediction is incorrect, the cost function increases exponentially as the model's confidence in the incorrect class grows.

In the context of binary classification with a sigmoid function, the cost function used is specifically called binary cross-entropy loss, which was shown above. It's important to note that there is also a general form known as cross-entropy loss (or categorical cross-entropy) used for multi-class classification problems.

The categorical cross-entropy loss for a single training instance is calculated as follows:

CategoricalΒ Cross-EntropyΒ Loss=βˆ’βˆ‘i=1Cyilog⁑(pi)\text{Categorical Cross-Entropy Loss} = -\sum_{i=1}^{C} y_i \log(p_i)

Where

  • CC is the number of classes;
  • yiy_i is the actual target value (1 if the class is the correct class, 0 otherwise);
  • pip_i is the predicted probability of the instance belonging to class ii.

We calculate the loss function for each training instance and take the average. This average is called the cost function. Logistic Regression finds the parameters Ξ²\beta that minimize the cost function.

question mark

Which of these is used as a loss in classification tasks?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 2

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Awesome!

Completion rate improved to 4.17

bookFinding the Parameters

Swipe to show menu

Logistic Regression only requires from computer to learn the best parameters Ξ²Ξ². For that, we need to define what "best parameters" means. Let's recall how the model works, it predicts the pp - probability of belonging to class 1:

p=Οƒ(z)=Οƒ(Ξ²0+Ξ²1x1+...)p = \sigma (z) = \sigma (\beta_0 + \beta_1x_1 + ...)

Where

Οƒ(z)=11+eβˆ’z\sigma (z) = \frac{1}{1 + e^{-z}}

Obviously, the model with good parameters is the one predicting high (close to 1) pp for instances that are actually of class 1 and low (close to 0) pp for instances with the actual class 0.

To measure how bad or how good the model is, we use a cost function. In linear regression, we used MSE (mean squared error) as a cost function. This time, a different function is used:

Here pp represents the probability of belonging to class 1, as predicted by the model, while yy denotes the actual target value.

This function not only penalizes incorrect predictions but also considers the model's confidence in its predictions. As illustrated in the image above, when the value of pp closely matches yy (the actual target), the cost function remains relatively small, indicating that the model confidently selected the correct class. Conversely, if the prediction is incorrect, the cost function increases exponentially as the model's confidence in the incorrect class grows.

In the context of binary classification with a sigmoid function, the cost function used is specifically called binary cross-entropy loss, which was shown above. It's important to note that there is also a general form known as cross-entropy loss (or categorical cross-entropy) used for multi-class classification problems.

The categorical cross-entropy loss for a single training instance is calculated as follows:

CategoricalΒ Cross-EntropyΒ Loss=βˆ’βˆ‘i=1Cyilog⁑(pi)\text{Categorical Cross-Entropy Loss} = -\sum_{i=1}^{C} y_i \log(p_i)

Where

  • CC is the number of classes;
  • yiy_i is the actual target value (1 if the class is the correct class, 0 otherwise);
  • pip_i is the predicted probability of the instance belonging to class ii.

We calculate the loss function for each training instance and take the average. This average is called the cost function. Logistic Regression finds the parameters Ξ²\beta that minimize the cost function.

question mark

Which of these is used as a loss in classification tasks?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 2
some-alt