Impara Building the Linear Regression with scikit-learn

You already know what Simple Linear Regression is and how to find the line that fits the data best. Let's go through all the steps of building a linear regression for a real dataset.

Loading data and looking at it

We have a file, simple_height_data.csv, with the data from our examples. Let's load the file and take a look at it.


              123456
            
import pandas as pd

file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv'
df = pd.read_csv(file_link)	# Read the file

print(df.head())	# Print the first 5 instances from a dataset

So the dataset has two columns: 'Height' - our target, and 'Father', the father's height. That is our feature.
Let's assign our target values to the y variable and feature values to X and build a scatterplot.


              123456789
            
import pandas as pd
import matplotlib.pyplot as plt

file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv'
df = pd.read_csv(file_link)	# Read the file

X = df['Father']	# Assign the feature
y = df['Height']	# Assign the target
plt.scatter(X,y)	# Build scatterplot

Now that we got acquainted with our data let's build a model!

Building a Linear Regression

Building a Linear Regression model with scikit-learn is quite simple!
There is a LinearRegression class for that.

You need to:
1. Initialize the LinearRegression class.

model = LinearRegression()

2. Train the model with a training set.

model.fit(X, y)

3. Now you can predict new instances.

model.predict(X_new)

Before putting it all together, there is one more thing to figure out.
Both .fit() and .predict() methods of the LinearRegression class expect X (or X_new) to be a 2-D array (or pandas DataFrame).
Choosing a single column from a DataFrame (df['col_name']) returns a pandas Series, which is not what .fit() or .predict() expects, so the following error will be raised:
ValueError: Expected 2D array, got 1D array instead
To avoid it, we need to select a single column like this:

X = df[['col_name']] # with double squared brackets

Now let's build a Linear Regression and predict new values!


              12345678910111213
            
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression # Import LinearRegression

file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv'
df = pd.read_csv(file_link)	# Read the file

X = df[['Father']]	# Assign the feature (with double square brackets)
y = df['Height']	# Assign the target (no need in double square brackets for target)
model = LinearRegression()  # Initialize a model
model.fit(X, y)  # Train a model
X_new = np.array([ [61], [64], [67] ]) # Creating a 2-D array of new instances
print(model.predict(X_new)) # Predict a target for new instances

Tutto è chiaro?

Grazie per i tuoi commenti!

Sezione 1. Capitolo 3

Chieda ad AI

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

Scorri per mostrare il menu