Contenido del Curso
Linear Regression for ML
Linear Regression for ML
Building the Polynomial Regression
Loading file
For this chapter, we have a file named poly.csv
.
Let's load the file and look at the contents.
import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) print(df.head(5))
So here we have one feature and the target. Let's build a scatter plot.
import pandas as pd import matplotlib.pyplot as plt file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) X = df['Feature'] y = df['Target'] plt.scatter(X,y)
It is hard to imagine a straight line fitting this data well. So let's build a Polynomial Regression!
Building a Polynomial Regression
As mentioned in the previous example, we can preprocess X
to perform Polynomial Regression using the following code:
Other steps are the same as Simple or Multiple Linear Regression:
import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures # Import PolynomialFeatures class file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) n = 2 # A degree of the polynomial regression X = df[['Feature']] # Assign X as a DataFrame y = df['Target'] # Assign y X_poly = PolynomialFeatures(n, include_bias=False).fit_transform(X) # Preprocess X regression_model = LinearRegression().fit(X_poly, y) # Initialize and train the model X_new = np.linspace(-0.1, 1.5, 80).reshape(-1,1) # 2-d array of new feature values X_new_poly = PolynomialFeatures(n, include_bias=False).fit_transform(X_new) # Transform X_new for predict() method y_pred = regression_model.predict(X_new_poly) plt.scatter(X, y) # Build a scatterplot plt.plot(X_new, y_pred) # Build a Polynomial Regression graph
Note
New instances must be preprocessed the same way as
X
before passing to a.predict()
method.
So you need to applyPolynomialFeatures
toX_new
too!
The code above works, but there is a better way to do it using Pipelines (covered in the ML introduction with scikit-learn course).
In short, pipelines act as containers for all preprocessing steps, such as PolynomialFeatures
in this case, and the model, which is LinearRegression
in our scenario.
They help you ensure that new instances are preprocessed in the same manner as the training set while also enhancing code convenience.
Here is the code using the pipelines:
import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures # Import PolynomialFeatures class from sklearn.pipeline import make_pipeline file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) n = 2 # A degree of the polynomial regression X = df[['Feature']] # Assign X as a DataFrame y = df['Target'] # Assign y polynomial_regression = make_pipeline( # Initialize the pipeline of a model PolynomialFeatures(n, include_bias=False), LinearRegression()) polynomial_regression.fit(X, y) # Train a model using pipeline X_new = np.linspace(-0.1, 1.5, 80).reshape(-1,1) # 2-d array of new feature values y_pred = polynomial_regression.predict(X_new) # Predict using pipeline plt.scatter(X, y) # Build a scatterplot plt.plot(X_new, y_pred) # Build a Polynomial Regression graph
Feel free to experiment with the values of n in the eighth line.
You will observe how the plot varies based on the degree of polynomial regression.
By paying attention, you may notice significant differences in predictions for feature values below 0 or above 1.4. That is the subject of the next chapter.
¡Gracias por tus comentarios!