Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda Building the Linear Regression with scikit-learn | Simple Linear Regression
Linear Regression for ML

bookBuilding the Linear Regression with scikit-learn

You already know what Simple Linear Regression is and how to find the line that fits the data best. Let's go through all the steps of building a linear regression for a real dataset.

Loading data and looking at it

We have a file, simple_height_data.csv, with the data from our examples. Let's load the file and take a look at it.

123456
import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file print(df.head()) # Print the first 5 instances from a dataset
copy

So the dataset has two columns: 'Height' - our target, and 'Father', the father's height. That is our feature.
Let's assign our target values to the y variable and feature values to X and build a scatterplot.

123456789
import pandas as pd import matplotlib.pyplot as plt file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file X = df['Father'] # Assign the feature y = df['Height'] # Assign the target plt.scatter(X,y) # Build scatterplot
copy

Now that we got acquainted with our data let's build a model!

Building a Linear Regression

Building a Linear Regression model with scikit-learn is quite simple!
There is a LinearRegression class for that.

You need to:
1. Initialize the LinearRegression class.

model = LinearRegression()

2. Train the model with a training set.

model.fit(X, y)

3. Now you can predict new instances.

model.predict(X_new)

Before putting it all together, there is one more thing to figure out.
Both .fit() and .predict() methods of the LinearRegression class expect X (or X_new) to be a 2-D array (or pandas DataFrame).
Choosing a single column from a DataFrame (df['col_name']) returns a pandas Series, which is not what .fit() or .predict() expects, so the following error will be raised:
ValueError: Expected 2D array, got 1D array instead
To avoid it, we need to select a single column like this:

X = df[['col_name']] # with double squared brackets

Now let's build a Linear Regression and predict new values!

12345678910111213
import pandas as pd import numpy as np from sklearn.linear_model import LinearRegression # Import LinearRegression file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file X = df[['Father']] # Assign the feature (with double square brackets) y = df['Height'] # Assign the target (no need in double square brackets for target) model = LinearRegression() # Initialize a model model.fit(X, y) # Train a model X_new = np.array([ [61], [64], [67] ]) # Creating a 2-D array of new instances print(model.predict(X_new)) # Predict a target for new instances
copy
question mark

What is the correct order of operations with LinearRegression to predict new values

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 1. Capítulo 3

Pergunte à IA

expand

Pergunte à IA

ChatGPT

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

Awesome!

Completion rate improved to 5.56

bookBuilding the Linear Regression with scikit-learn

Deslize para mostrar o menu

You already know what Simple Linear Regression is and how to find the line that fits the data best. Let's go through all the steps of building a linear regression for a real dataset.

Loading data and looking at it

We have a file, simple_height_data.csv, with the data from our examples. Let's load the file and take a look at it.

123456
import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file print(df.head()) # Print the first 5 instances from a dataset
copy

So the dataset has two columns: 'Height' - our target, and 'Father', the father's height. That is our feature.
Let's assign our target values to the y variable and feature values to X and build a scatterplot.

123456789
import pandas as pd import matplotlib.pyplot as plt file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file X = df['Father'] # Assign the feature y = df['Height'] # Assign the target plt.scatter(X,y) # Build scatterplot
copy

Now that we got acquainted with our data let's build a model!

Building a Linear Regression

Building a Linear Regression model with scikit-learn is quite simple!
There is a LinearRegression class for that.

You need to:
1. Initialize the LinearRegression class.

model = LinearRegression()

2. Train the model with a training set.

model.fit(X, y)

3. Now you can predict new instances.

model.predict(X_new)

Before putting it all together, there is one more thing to figure out.
Both .fit() and .predict() methods of the LinearRegression class expect X (or X_new) to be a 2-D array (or pandas DataFrame).
Choosing a single column from a DataFrame (df['col_name']) returns a pandas Series, which is not what .fit() or .predict() expects, so the following error will be raised:
ValueError: Expected 2D array, got 1D array instead
To avoid it, we need to select a single column like this:

X = df[['col_name']] # with double squared brackets

Now let's build a Linear Regression and predict new values!

12345678910111213
import pandas as pd import numpy as np from sklearn.linear_model import LinearRegression # Import LinearRegression file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file X = df[['Father']] # Assign the feature (with double square brackets) y = df['Height'] # Assign the target (no need in double square brackets for target) model = LinearRegression() # Initialize a model model.fit(X, y) # Train a model X_new = np.array([ [61], [64], [67] ]) # Creating a 2-D array of new instances print(model.predict(X_new)) # Predict a target for new instances
copy
question mark

What is the correct order of operations with LinearRegression to predict new values

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 1. Capítulo 3
some-alt