Course Content
Classification with Python
Classification with Python
Decision Boundary
Let's plot the results of Logistic Regression. Consider the following two-feature example:
Once we build a Logistic Regression, we can plot a decision boundary. It shows each class's region where new instances are predicted as that class. For example, here is the decision boundary of Logistic Regression applied to the data above:
We can see that the line perfectly separates two classes here. When that happens, the dataset is called linearly separable. However, that is not always the case. What if the dataset would be like this:
Above is a decision boundary for a little different dataset. Here the data is not linearly separable; hence the predictions made by Logistic Regression are imperfect. Unfortunately, by default, the Logistic Regression cannot predict more complex decision boundaries, so it is the best prediction we can get.
But remember that Logistic Regression is derived from Linear Regression which has a solution to a problem with the model being too simple. This solution is a Polynomial Regression, and we can use its equation for calculating z to get a more complex decision boundary shape:
Just like in Polynomial Regression, we can use the PolynomialFeatures
transformer to add polynomial terms to our features — this helps the model learn more complex patterns.
python
This line transforms the original input features in X
by adding:
- squared terms (e.g., )
- interaction terms (e.g., if there are multiple features)
For example, if X
originally has two features: , then after applying PolynomialFeatures(2, include_bias=False)
, you get:
This allows models like Logistic Regression to capture non-linear relationships and produce more flexible, curved decision boundaries. However, increasing the degree too much can lead to a model that fits the training data too well — a problem known as overfitting. That's why we usually try smaller degrees first and evaluate the model carefully.
Thanks for your feedback!