Conteúdo do Curso
Classification with Python
Classification with Python
What is Logistic Regression
Logistic Regression is actually a classification algorithm, despite the word "Regression" in its name.
It gets its name because it's based on Linear Regression, but it uses a logistic (sigmoid) function to convert the output into probabilities, allowing it to classify data into categories instead of predicting continuous values.
Suppose you want to predict whether a person will default on a first loan (no credit history available).
In Linear Regression, we build an equation to predict numerical values. We can use the same equation to calculate a "reliability score". It will account for features like income, duration of current employment, debt-to-income ratio, etc. A higher reliability score means a lower chance of default.
The β values are the parameters that the model needs to learn. During training, the computer adjusts these values to make better predictions. It does this by trying to minimize the difference between the predicted results and the actual labels — this difference is measured by something called the loss function.
To turn the model's raw output into a class label (0 or 1), Logistic Regression uses a sigmoid function. This function takes any real-valued number and squeezes it into a range between 0 and 1, making it interpretable as a probability.
The sigmoid function is defined as:
Here, is the score (also called the logit) that we previously calculated.
Given two classes: 1 (a person will default on a first loan) and 0 (a person won't default on a first loan), after applying the sigmoid, we get the probability of the instance belonging to class 1.
To make a final decision (0 or 1), we compare the probability to a threshold — usually 0.5:
- If the probability is greater than 0.5, we predict 1;
- If it's less than or equal to 0.5, we predict 0.
Obrigado pelo seu feedback!