Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Building the Linear Regression with scikit-learn | Simple Linear Regression
Linear Regression for ML
course content

Зміст курсу

Linear Regression for ML

Linear Regression for ML

1. Simple Linear Regression
2. Multiple Linear Regression
3. Polynomial Regression
4. Evaluating and Comparing Models

Building the Linear Regression with scikit-learn

You already know what Simple Linear Regression is and how to find the line that fits the data best. Let's go through all the steps of building a linear regression for a real dataset.

Loading data and looking at it

We have a file, simple_height_data.csv, with the data from our examples. Let's load the file and take a look at it.

123456
import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file print(df.head()) # Print the first 5 instances from a dataset
copy

So the dataset has two columns: 'Height' - our target, and 'Father', the father's height. That is our feature.
Let's assign our target values to the y variable and feature values to X and build a scatterplot.

123456789
import pandas as pd import matplotlib.pyplot as plt file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file X = df['Father'] # Assign the feature y = df['Height'] # Assign the target plt.scatter(X,y) # Build scatterplot
copy

Now that we got acquainted with our data let's build a model!

Building a Linear Regression

Building a Linear Regression model with scikit-learn is quite simple!
There is a LinearRegression class for that.

You need to:
1. Initialize the LinearRegression class.

2. Train the model with a training set.

3. Now you can predict new instances.

Before putting it all together, there is one more thing to figure out.
Both .fit() and .predict() methods of the LinearRegression class expect X (or X_new) to be a 2-D array (or pandas DataFrame).
Choosing a single column from a DataFrame (df['col_name']) returns a pandas Series, which is not what .fit() or .predict() expects, so the following error will be raised:
ValueError: Expected 2D array, got 1D array instead
To avoid it, we need to select a single column like this:

Now let's build a Linear Regression and predict new values!

12345678910111213
import pandas as pd import numpy as np from sklearn.linear_model import LinearRegression # Import LinearRegression file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file X = df[['Father']] # Assign the feature (with double square brackets) y = df['Height'] # Assign the target (no need in double square brackets for target) model = LinearRegression() # Initialize a model model.fit(X, y) # Train a model X_new = np.array([ [61], [64], [67] ]) # Creating a 2-D array of new instances print(model.predict(X_new)) # Predict a target for new instances
copy

What is the correct order of operations with LinearRegression to predict new values

Виберіть правильну відповідь

Все було зрозуміло?

Секція 1. Розділ 3
We're sorry to hear that something went wrong. What happened?
some-alt