Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Decision Boundary | Logistic Regression
Classification with Python
course content

Course Content

Classification with Python

Classification with Python

1. k-NN Classifier
2. Logistic Regression
3. Decision Tree
4. Random Forest
5. Comparing Models

book
Decision Boundary

Let's plot the results of Logistic Regression. Consider the following two-feature example:

Once we build a Logistic Regression, we can plot a decision boundary. It shows each class's region where new instances are predicted as that class. For example, here is the decision boundary of Logistic Regression applied to the data above:

We can see that the line perfectly separates two classes here. When that happens, the dataset is called linearly separable. However, that is not always the case. What if the dataset would be like this:

Above is a decision boundary for a little different dataset. Here the data is not linearly separable; hence the predictions made by Logistic Regression are imperfect. Unfortunately, by default, the Logistic Regression cannot predict more complex decision boundaries, so it is the best prediction we can get.

But remember that Logistic Regression is derived from Linear Regression which has a solution to a problem with the model being too simple. This solution is a Polynomial Regression, and we can use its equation for calculating z to get a more complex decision boundary shape:

z=β0+β1x1+β2x2+β3x12+β4x1x2+β5x22z = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_1^2 + \beta_4 x_1 x_2 + \beta_5 x_2^2

Just like in Polynomial Regression, we can use the PolynomialFeatures transformer to add polynomial terms to our features — this helps the model learn more complex patterns.

python

This line transforms the original input features in X by adding:

  • squared terms (e.g., x2x^2)
  • interaction terms (e.g., x1x2x_1 \cdot x_2 if there are multiple features)

For example, if X originally has two features: [x1,x2][x_1, x_2], then after applying PolynomialFeatures(2, include_bias=False), you get: [x1,x2,x12,x1x2,x22][x_1, x_{2}, x_{1}\\^{2} , x_{1} x_{2}, x_{2}\\^{2}]

This allows models like Logistic Regression to capture non-linear relationships and produce more flexible, curved decision boundaries. However, increasing the degree too much can lead to a model that fits the training data too well — a problem known as overfitting. That's why we usually try smaller degrees first and evaluate the model carefully.

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 4
We're sorry to hear that something went wrong. What happened?
some-alt