Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Building the Polynomial Regression | Polynomial Regression
Linear Regression for ML
course content

Contenido del Curso

Linear Regression for ML

Linear Regression for ML

1. Simple Linear Regression
2. Multiple Linear Regression
3. Polynomial Regression
4. Evaluating and Comparing Models

bookBuilding the Polynomial Regression

Loading file

For this chapter, we have a file named poly.csv.
Let's load the file and look at the contents.

123456
import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) print(df.head(5))
copy

So here we have one feature and the target. Let's build a scatter plot.

12345678
import pandas as pd import matplotlib.pyplot as plt file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) X = df['Feature'] y = df['Target'] plt.scatter(X,y)
copy

It is hard to imagine a straight line fitting this data well. So let's build a Polynomial Regression!

Building a Polynomial Regression

As mentioned in the previous example, we can preprocess X to perform Polynomial Regression using the following code:

Other steps are the same as Simple or Multiple Linear Regression:

1234567891011121314151617181920
import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures # Import PolynomialFeatures class file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) n = 2 # A degree of the polynomial regression X = df[['Feature']] # Assign X as a DataFrame y = df['Target'] # Assign y X_poly = PolynomialFeatures(n, include_bias=False).fit_transform(X) # Preprocess X regression_model = LinearRegression().fit(X_poly, y) # Initialize and train the model X_new = np.linspace(-0.1, 1.5, 80).reshape(-1,1) # 2-d array of new feature values X_new_poly = PolynomialFeatures(n, include_bias=False).fit_transform(X_new) # Transform X_new for predict() method y_pred = regression_model.predict(X_new_poly) plt.scatter(X, y) # Build a scatterplot plt.plot(X_new, y_pred) # Build a Polynomial Regression graph
copy

Note

New instances must be preprocessed the same way as X before passing to a .predict() method.
So you need to apply PolynomialFeatures to X_new too!

The code above works, but there is a better way to do it using Pipelines (covered in the ML introduction with scikit-learn course).

In short, pipelines act as containers for all preprocessing steps, such as PolynomialFeatures in this case, and the model, which is LinearRegression in our scenario.
They help you ensure that new instances are preprocessed in the same manner as the training set while also enhancing code convenience. Here is the code using the pipelines:

1234567891011121314151617181920212223
import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures # Import PolynomialFeatures class from sklearn.pipeline import make_pipeline file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) n = 2 # A degree of the polynomial regression X = df[['Feature']] # Assign X as a DataFrame y = df['Target'] # Assign y polynomial_regression = make_pipeline( # Initialize the pipeline of a model PolynomialFeatures(n, include_bias=False), LinearRegression()) polynomial_regression.fit(X, y) # Train a model using pipeline X_new = np.linspace(-0.1, 1.5, 80).reshape(-1,1) # 2-d array of new feature values y_pred = polynomial_regression.predict(X_new) # Predict using pipeline plt.scatter(X, y) # Build a scatterplot plt.plot(X_new, y_pred) # Build a Polynomial Regression graph
copy

Feel free to experiment with the values of n in the eighth line.
You will observe how the plot varies based on the degree of polynomial regression.
By paying attention, you may notice significant differences in predictions for feature values below 0 or above 1.4. That is the subject of the next chapter.

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 3. Capítulo 4
some-alt