Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Overfitting and Model Complexity | Regularization Fundamentals
Feature Selection and Regularization Techniques

bookOverfitting and Model Complexity

Understanding how your model performs on new, unseen data is a core challenge in supervised learning. Two concepts that often come up are overfitting and underfitting. Overfitting happens when your model learns not only the underlying pattern in the training data but also the noise—meaning it performs very well on the training set but poorly on new data. Underfitting is the opposite: your model is too simple to capture the underlying structure, resulting in poor performance on both training and test data.

This leads to the bias–variance tradeoff. Bias refers to errors introduced by approximating a real-world problem with a simplified model. Variance is the error introduced by sensitivity to small fluctuations in the training set. A model with high bias pays little attention to the training data and oversimplifies the model (underfitting). A model with high variance pays too much attention to the training data and does not generalize well (overfitting). Finding the right balance between bias and variance is crucial for building models that generalize well.

1234567891011121314151617181920212223242526272829303132333435
import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures # Generate synthetic data np.random.seed(0) X = np.linspace(0, 1, 20) y = 1.5 * X + np.random.normal(0, 0.15, size=X.shape) # Reshape X for sklearn X = X.reshape(-1, 1) # Fit linear regression (degree 1) poly1 = PolynomialFeatures(degree=1) X_poly1 = poly1.fit_transform(X) model1 = LinearRegression().fit(X_poly1, y) y_pred1 = model1.predict(X_poly1) # Fit polynomial regression (degree 15 - very complex) poly15 = PolynomialFeatures(degree=15) X_poly15 = poly15.fit_transform(X) model15 = LinearRegression().fit(X_poly15, y) y_pred15 = model15.predict(X_poly15) # Plot results plt.figure(figsize=(10, 5)) plt.scatter(X, y, color='black', label='Data') plt.plot(X, y_pred1, color='blue', label='Degree 1 (Underfit)') plt.plot(X, y_pred15, color='red', linestyle='--', label='Degree 15 (Overfit)') plt.legend() plt.title('Polynomial Regression: Underfitting vs Overfitting') plt.xlabel('X') plt.ylabel('y') plt.show()
copy

When you increase the complexity of your model, such as by raising the polynomial degree in regression, you give the model more flexibility to fit the training data. In the code above, the degree 1 polynomial (a straight line) cannot capture the pattern in the data well, resulting in underfitting. The degree 15 polynomial, on the other hand, fits the training data almost perfectly—including its noise—leading to overfitting. This model will likely perform poorly on new data because it has learned patterns that do not generalize. The key is to choose a model that is complex enough to capture the underlying trend, but not so complex that it memorizes noise.

This is why controlling model complexity is so important for generalization. You want your model to perform well on both the training data and unseen data. As you saw in the previous example, too simple a model leads to high bias and underfitting, while too complex a model leads to high variance and overfitting.

Note
Definition

Regularization is a set of techniques used to control model complexity by adding a penalty to large parameter values in a model. By discouraging overly complex models, regularization helps prevent overfitting and improves the model's ability to generalize to new data.

question mark

Which of the following statements are true about overfitting, underfitting, bias–variance tradeoff, and regularization?

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 1. Розділ 1

Запитати АІ

expand

Запитати АІ

ChatGPT

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Awesome!

Completion rate improved to 8.33

bookOverfitting and Model Complexity

Свайпніть щоб показати меню

Understanding how your model performs on new, unseen data is a core challenge in supervised learning. Two concepts that often come up are overfitting and underfitting. Overfitting happens when your model learns not only the underlying pattern in the training data but also the noise—meaning it performs very well on the training set but poorly on new data. Underfitting is the opposite: your model is too simple to capture the underlying structure, resulting in poor performance on both training and test data.

This leads to the bias–variance tradeoff. Bias refers to errors introduced by approximating a real-world problem with a simplified model. Variance is the error introduced by sensitivity to small fluctuations in the training set. A model with high bias pays little attention to the training data and oversimplifies the model (underfitting). A model with high variance pays too much attention to the training data and does not generalize well (overfitting). Finding the right balance between bias and variance is crucial for building models that generalize well.

1234567891011121314151617181920212223242526272829303132333435
import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures # Generate synthetic data np.random.seed(0) X = np.linspace(0, 1, 20) y = 1.5 * X + np.random.normal(0, 0.15, size=X.shape) # Reshape X for sklearn X = X.reshape(-1, 1) # Fit linear regression (degree 1) poly1 = PolynomialFeatures(degree=1) X_poly1 = poly1.fit_transform(X) model1 = LinearRegression().fit(X_poly1, y) y_pred1 = model1.predict(X_poly1) # Fit polynomial regression (degree 15 - very complex) poly15 = PolynomialFeatures(degree=15) X_poly15 = poly15.fit_transform(X) model15 = LinearRegression().fit(X_poly15, y) y_pred15 = model15.predict(X_poly15) # Plot results plt.figure(figsize=(10, 5)) plt.scatter(X, y, color='black', label='Data') plt.plot(X, y_pred1, color='blue', label='Degree 1 (Underfit)') plt.plot(X, y_pred15, color='red', linestyle='--', label='Degree 15 (Overfit)') plt.legend() plt.title('Polynomial Regression: Underfitting vs Overfitting') plt.xlabel('X') plt.ylabel('y') plt.show()
copy

When you increase the complexity of your model, such as by raising the polynomial degree in regression, you give the model more flexibility to fit the training data. In the code above, the degree 1 polynomial (a straight line) cannot capture the pattern in the data well, resulting in underfitting. The degree 15 polynomial, on the other hand, fits the training data almost perfectly—including its noise—leading to overfitting. This model will likely perform poorly on new data because it has learned patterns that do not generalize. The key is to choose a model that is complex enough to capture the underlying trend, but not so complex that it memorizes noise.

This is why controlling model complexity is so important for generalization. You want your model to perform well on both the training data and unseen data. As you saw in the previous example, too simple a model leads to high bias and underfitting, while too complex a model leads to high variance and overfitting.

Note
Definition

Regularization is a set of techniques used to control model complexity by adding a penalty to large parameter values in a model. By discouraging overly complex models, regularization helps prevent overfitting and improves the model's ability to generalize to new data.

question mark

Which of the following statements are true about overfitting, underfitting, bias–variance tradeoff, and regularization?

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 1. Розділ 1
some-alt