Conteúdo do Curso
Linear Regression for ML
Linear Regression for ML
Building the Linear Regression with scikit-learn
You already know what Simple Linear Regression is and how to find the line that fits the data best. Let's go through all the steps of building a linear regression for a real dataset.
Loading data and looking at it
We have a file, simple_height_data.csv
, with the data from our examples. Let's load the file and take a look at it.
import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file print(df.head()) # Print the first 5 instances from a dataset
So the dataset has two columns: 'Height' - our target, and 'Father', the father's height. That is our feature.
Let's assign our target values to the y
variable and feature values to X
and build a scatterplot.
import pandas as pd import matplotlib.pyplot as plt file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file X = df['Father'] # Assign the feature y = df['Height'] # Assign the target plt.scatter(X,y) # Build scatterplot
Now that we got acquainted with our data let's build a model!
Building a Linear Regression
Building a Linear Regression model with scikit-learn is quite simple!
There is a LinearRegression
class for that.
You need to:
1. Initialize the LinearRegression
class.
model = LinearRegression()
2. Train the model with a training set.
model.fit(X, y)
3. Now you can predict new instances.
model.predict(X_new)
Before putting it all together, there is one more thing to figure out.
Both .fit()
and .predict()
methods of the LinearRegression
class expect X
(or X_new
) to be a 2-D array (or pandas DataFrame).
Choosing a single column from a DataFrame (df['col_name']
) returns a pandas Series, which is not what .fit()
or .predict()
expects, so the following error will be raised:
ValueError: Expected 2D array, got 1D array instead
To avoid it, we need to select a single column like this:
X = df[['col_name']] # with double squared brackets
Now let's build a Linear Regression and predict new values!
import pandas as pd import numpy as np from sklearn.linear_model import LinearRegression # Import LinearRegression file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file X = df[['Father']] # Assign the feature (with double square brackets) y = df['Height'] # Assign the target (no need in double square brackets for target) model = LinearRegression() # Initialize a model model.fit(X, y) # Train a model X_new = np.array([ [61], [64], [67] ]) # Creating a 2-D array of new instances print(model.predict(X_new)) # Predict a target for new instances
Obrigado pelo seu feedback!