Building the Linear Regression with scikit-learn
You already know what Simple Linear Regression is and how to find the line that fits the data best. Let's go through all the steps of building a linear regression for a real dataset.
Loading data and looking at it
We have a file, simple_height_data.csv
, with the data from our examples. Let's load the file and take a look at it.
123456import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file print(df.head()) # Print the first 5 instances from a dataset
So the dataset has two columns: 'Height' - our target, and 'Father', the father's height. That is our feature.
Let's assign our target values to the y
variable and feature values to X
and build a scatterplot.
123456789import pandas as pd import matplotlib.pyplot as plt file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file X = df['Father'] # Assign the feature y = df['Height'] # Assign the target plt.scatter(X,y) # Build scatterplot
Now that we got acquainted with our data let's build a model!
Building a Linear Regression
Building a Linear Regression model with scikit-learn is quite simple!
There is a LinearRegression
class for that.
You need to:
1. Initialize the LinearRegression
class.
model = LinearRegression()
2. Train the model with a training set.
model.fit(X, y)
3. Now you can predict new instances.
model.predict(X_new)
Before putting it all together, there is one more thing to figure out.
Both .fit()
and .predict()
methods of the LinearRegression
class expect X
(or X_new
) to be a 2-D array (or pandas DataFrame).
Choosing a single column from a DataFrame (df['col_name']
) returns a pandas Series, which is not what .fit()
or .predict()
expects, so the following error will be raised:
ValueError: Expected 2D array, got 1D array instead
To avoid it, we need to select a single column like this:
X = df[['col_name']] # with double squared brackets
Now let's build a Linear Regression and predict new values!
12345678910111213import pandas as pd import numpy as np from sklearn.linear_model import LinearRegression # Import LinearRegression file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file X = df[['Father']] # Assign the feature (with double square brackets) y = df['Height'] # Assign the target (no need in double square brackets for target) model = LinearRegression() # Initialize a model model.fit(X, y) # Train a model X_new = np.array([ [61], [64], [67] ]) # Creating a 2-D array of new instances print(model.predict(X_new)) # Predict a target for new instances
Grazie per i tuoi commenti!
Chieda ad AI
Chieda ad AI
Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione
Awesome!
Completion rate improved to 5.56
Building the Linear Regression with scikit-learn
Scorri per mostrare il menu
You already know what Simple Linear Regression is and how to find the line that fits the data best. Let's go through all the steps of building a linear regression for a real dataset.
Loading data and looking at it
We have a file, simple_height_data.csv
, with the data from our examples. Let's load the file and take a look at it.
123456import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file print(df.head()) # Print the first 5 instances from a dataset
So the dataset has two columns: 'Height' - our target, and 'Father', the father's height. That is our feature.
Let's assign our target values to the y
variable and feature values to X
and build a scatterplot.
123456789import pandas as pd import matplotlib.pyplot as plt file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file X = df['Father'] # Assign the feature y = df['Height'] # Assign the target plt.scatter(X,y) # Build scatterplot
Now that we got acquainted with our data let's build a model!
Building a Linear Regression
Building a Linear Regression model with scikit-learn is quite simple!
There is a LinearRegression
class for that.
You need to:
1. Initialize the LinearRegression
class.
model = LinearRegression()
2. Train the model with a training set.
model.fit(X, y)
3. Now you can predict new instances.
model.predict(X_new)
Before putting it all together, there is one more thing to figure out.
Both .fit()
and .predict()
methods of the LinearRegression
class expect X
(or X_new
) to be a 2-D array (or pandas DataFrame).
Choosing a single column from a DataFrame (df['col_name']
) returns a pandas Series, which is not what .fit()
or .predict()
expects, so the following error will be raised:
ValueError: Expected 2D array, got 1D array instead
To avoid it, we need to select a single column like this:
X = df[['col_name']] # with double squared brackets
Now let's build a Linear Regression and predict new values!
12345678910111213import pandas as pd import numpy as np from sklearn.linear_model import LinearRegression # Import LinearRegression file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file X = df[['Father']] # Assign the feature (with double square brackets) y = df['Height'] # Assign the target (no need in double square brackets for target) model = LinearRegression() # Initialize a model model.fit(X, y) # Train a model X_new = np.array([ [61], [64], [67] ]) # Creating a 2-D array of new instances print(model.predict(X_new)) # Predict a target for new instances
Grazie per i tuoi commenti!