Overfitting and Regularization
As demonstrated in the previous chapter, using PolynomialFeatures
, you can create a complex decision boundary. Second-degree polynomial features can even produce the boundaries shown in the image below:
And it is only a degree of two. A higher degree may yield even more complex shapes. But there is a problem with it. The decision boundary built by Logistic Regression may become too complicated, causing the model to overfit.
Overfitting is when the model, instead of learning general patterns in data, builds a very complex decision boundary to handle every training instance. Still, it does not perform as well on data it has never seen, while performing well on unseen data is a primary task of the Machine Learning model.
The regularization tackles the problem of overfitting. In fact, ℓ2 regularization is used in the LogisticRegression
class by default. But you need to configure how strongly the model should be regularized. It is controlled by a C
parameter.
- greater
C
– lower regularization, more overfitting; - lower
C
– stronger regularization, less overfitting (but possibly underfitting).



What values of C
will result in a good model depends on the dataset, thus better to choose it using the GridSearchCV
.
The LogisticRegression
class includes regularization by default, so you should either remove regularization(by setting penalty=None
) or scale the data(e.g., using StandardScaler
).
1. Choose the INCORRECT statement.
2. What is the correct order to preprocess data
Takk for tilbakemeldingene dine!
Spør AI
Spør AI
Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår
Still meg spørsmål om dette emnet
Oppsummer dette kapittelet
Vis eksempler fra virkeligheten
Awesome!
Completion rate improved to 4.17
Overfitting and Regularization
Sveip for å vise menyen
As demonstrated in the previous chapter, using PolynomialFeatures
, you can create a complex decision boundary. Second-degree polynomial features can even produce the boundaries shown in the image below:
And it is only a degree of two. A higher degree may yield even more complex shapes. But there is a problem with it. The decision boundary built by Logistic Regression may become too complicated, causing the model to overfit.
Overfitting is when the model, instead of learning general patterns in data, builds a very complex decision boundary to handle every training instance. Still, it does not perform as well on data it has never seen, while performing well on unseen data is a primary task of the Machine Learning model.
The regularization tackles the problem of overfitting. In fact, ℓ2 regularization is used in the LogisticRegression
class by default. But you need to configure how strongly the model should be regularized. It is controlled by a C
parameter.
- greater
C
– lower regularization, more overfitting; - lower
C
– stronger regularization, less overfitting (but possibly underfitting).



What values of C
will result in a good model depends on the dataset, thus better to choose it using the GridSearchCV
.
The LogisticRegression
class includes regularization by default, so you should either remove regularization(by setting penalty=None
) or scale the data(e.g., using StandardScaler
).
1. Choose the INCORRECT statement.
2. What is the correct order to preprocess data
Takk for tilbakemeldingene dine!