Learn Building the Polynomial Regression

Loading file

For this chapter, we have a file named poly.csv.
Let's load the file and look at the contents.


              123456
            
import pandas as pd

file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv'
df = pd.read_csv(file_link)

print(df.head(5))

So here we have one feature and the target. Let's build a scatter plot.


              12345678
            
import pandas as pd
import matplotlib.pyplot as plt

file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv'
df = pd.read_csv(file_link)
X = df['Feature']
y = df['Target']
plt.scatter(X,y)

It is hard to imagine a straight line fitting this data well. So let's build a Polynomial Regression!

Building a Polynomial Regression

As mentioned in the previous example, we can preprocess X to perform Polynomial Regression using the following code:

X = PolynomialFeatures(n, include_bias=False).fit_transform(X)

Other steps are the same as Simple or Multiple Linear Regression:


              1234567891011121314151617181920
            
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures # Import PolynomialFeatures class

file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv'
df = pd.read_csv(file_link)
n = 2   # A degree of the polynomial regression
X = df[['Feature']] # Assign X as a DataFrame
y = df['Target'] # Assign y
X_poly = PolynomialFeatures(n, include_bias=False).fit_transform(X) # Preprocess X
regression_model = LinearRegression().fit(X_poly, y) # Initialize and train the model

X_new = np.linspace(-0.1, 1.5, 80).reshape(-1,1) # 2-d array of new feature values
X_new_poly = PolynomialFeatures(n, include_bias=False).fit_transform(X_new) # Transform X_new for predict() method
y_pred = regression_model.predict(X_new_poly)

plt.scatter(X, y)	# Build a scatterplot
plt.plot(X_new, y_pred)	# Build a Polynomial Regression graph

Note

New instances must be preprocessed the same way as X before passing to a .predict() method.
So you need to apply PolynomialFeatures to X_new too!

The code above works, but there is a better way to do it using Pipelines (covered in the ML introduction with scikit-learn course).

In short, pipelines act as containers for all preprocessing steps, such as PolynomialFeatures in this case, and the model, which is LinearRegression in our scenario.
They help you ensure that new instances are preprocessed in the same manner as the training set while also enhancing code convenience. Here is the code using the pipelines:


              1234567891011121314151617181920212223
            
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures # Import PolynomialFeatures class
from sklearn.pipeline import make_pipeline

file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv'
df = pd.read_csv(file_link)
n = 2   # A degree of the polynomial regression
X = df[['Feature']] # Assign X as a DataFrame
y = df['Target'] # Assign y

polynomial_regression = make_pipeline( # Initialize the pipeline of a model
	PolynomialFeatures(n, include_bias=False),
	LinearRegression())
polynomial_regression.fit(X, y) # Train a model using pipeline

X_new = np.linspace(-0.1, 1.5, 80).reshape(-1,1) # 2-d array of new feature values
y_pred = polynomial_regression.predict(X_new) # Predict using pipeline

plt.scatter(X, y)	# Build a scatterplot
plt.plot(X_new, y_pred)	# Build a Polynomial Regression graph

Feel free to experiment with the values of n in the eighth line.
You will observe how the plot varies based on the degree of polynomial regression.
By paying attention, you may notice significant differences in predictions for feature values below 0 or above 1.4. That is the subject of the next chapter.

Everything was clear?

Thanks for your feedback!

Section 3. Chapter 4

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Swipe to show menu