Building the Polynomial Regression
Loading file
For this chapter, we have a file named poly.csv
.
Let's load the file and look at the contents.
123456import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) print(df.head(5))
So here we have one feature and the target. Let's build a scatter plot.
12345678import pandas as pd import matplotlib.pyplot as plt file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) X = df['Feature'] y = df['Target'] plt.scatter(X,y)
It is hard to imagine a straight line fitting this data well. So let's build a Polynomial Regression!
Building a Polynomial Regression
As mentioned in the previous example, we can preprocess X
to perform Polynomial Regression using the following code:
X = PolynomialFeatures(n, include_bias=False).fit_transform(X)
Other steps are the same as Simple or Multiple Linear Regression:
1234567891011121314151617181920import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures # Import PolynomialFeatures class file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) n = 2 # A degree of the polynomial regression X = df[['Feature']] # Assign X as a DataFrame y = df['Target'] # Assign y X_poly = PolynomialFeatures(n, include_bias=False).fit_transform(X) # Preprocess X regression_model = LinearRegression().fit(X_poly, y) # Initialize and train the model X_new = np.linspace(-0.1, 1.5, 80).reshape(-1,1) # 2-d array of new feature values X_new_poly = PolynomialFeatures(n, include_bias=False).fit_transform(X_new) # Transform X_new for predict() method y_pred = regression_model.predict(X_new_poly) plt.scatter(X, y) # Build a scatterplot plt.plot(X_new, y_pred) # Build a Polynomial Regression graph
Note
New instances must be preprocessed the same way as
X
before passing to a.predict()
method.
So you need to applyPolynomialFeatures
toX_new
too!
The code above works, but there is a better way to do it using Pipelines (covered in the ML introduction with scikit-learn course).
In short, pipelines act as containers for all preprocessing steps, such as PolynomialFeatures
in this case, and the model, which is LinearRegression
in our scenario.
They help you ensure that new instances are preprocessed in the same manner as the training set while also enhancing code convenience.
Here is the code using the pipelines:
1234567891011121314151617181920212223import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures # Import PolynomialFeatures class from sklearn.pipeline import make_pipeline file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) n = 2 # A degree of the polynomial regression X = df[['Feature']] # Assign X as a DataFrame y = df['Target'] # Assign y polynomial_regression = make_pipeline( # Initialize the pipeline of a model PolynomialFeatures(n, include_bias=False), LinearRegression()) polynomial_regression.fit(X, y) # Train a model using pipeline X_new = np.linspace(-0.1, 1.5, 80).reshape(-1,1) # 2-d array of new feature values y_pred = polynomial_regression.predict(X_new) # Predict using pipeline plt.scatter(X, y) # Build a scatterplot plt.plot(X_new, y_pred) # Build a Polynomial Regression graph
Feel free to experiment with the values of n in the eighth line.
You will observe how the plot varies based on the degree of polynomial regression.
By paying attention, you may notice significant differences in predictions for feature values below 0 or above 1.4. That is the subject of the next chapter.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Ask me questions about this topic
Summarize this chapter
Show real-world examples
Awesome!
Completion rate improved to 5.56
Building the Polynomial Regression
Swipe to show menu
Loading file
For this chapter, we have a file named poly.csv
.
Let's load the file and look at the contents.
123456import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) print(df.head(5))
So here we have one feature and the target. Let's build a scatter plot.
12345678import pandas as pd import matplotlib.pyplot as plt file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) X = df['Feature'] y = df['Target'] plt.scatter(X,y)
It is hard to imagine a straight line fitting this data well. So let's build a Polynomial Regression!
Building a Polynomial Regression
As mentioned in the previous example, we can preprocess X
to perform Polynomial Regression using the following code:
X = PolynomialFeatures(n, include_bias=False).fit_transform(X)
Other steps are the same as Simple or Multiple Linear Regression:
1234567891011121314151617181920import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures # Import PolynomialFeatures class file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) n = 2 # A degree of the polynomial regression X = df[['Feature']] # Assign X as a DataFrame y = df['Target'] # Assign y X_poly = PolynomialFeatures(n, include_bias=False).fit_transform(X) # Preprocess X regression_model = LinearRegression().fit(X_poly, y) # Initialize and train the model X_new = np.linspace(-0.1, 1.5, 80).reshape(-1,1) # 2-d array of new feature values X_new_poly = PolynomialFeatures(n, include_bias=False).fit_transform(X_new) # Transform X_new for predict() method y_pred = regression_model.predict(X_new_poly) plt.scatter(X, y) # Build a scatterplot plt.plot(X_new, y_pred) # Build a Polynomial Regression graph
Note
New instances must be preprocessed the same way as
X
before passing to a.predict()
method.
So you need to applyPolynomialFeatures
toX_new
too!
The code above works, but there is a better way to do it using Pipelines (covered in the ML introduction with scikit-learn course).
In short, pipelines act as containers for all preprocessing steps, such as PolynomialFeatures
in this case, and the model, which is LinearRegression
in our scenario.
They help you ensure that new instances are preprocessed in the same manner as the training set while also enhancing code convenience.
Here is the code using the pipelines:
1234567891011121314151617181920212223import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures # Import PolynomialFeatures class from sklearn.pipeline import make_pipeline file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) n = 2 # A degree of the polynomial regression X = df[['Feature']] # Assign X as a DataFrame y = df['Target'] # Assign y polynomial_regression = make_pipeline( # Initialize the pipeline of a model PolynomialFeatures(n, include_bias=False), LinearRegression()) polynomial_regression.fit(X, y) # Train a model using pipeline X_new = np.linspace(-0.1, 1.5, 80).reshape(-1,1) # 2-d array of new feature values y_pred = polynomial_regression.predict(X_new) # Predict using pipeline plt.scatter(X, y) # Build a scatterplot plt.plot(X_new, y_pred) # Build a Polynomial Regression graph
Feel free to experiment with the values of n in the eighth line.
You will observe how the plot varies based on the degree of polynomial regression.
By paying attention, you may notice significant differences in predictions for feature values below 0 or above 1.4. That is the subject of the next chapter.
Thanks for your feedback!