Contenido del Curso

Linear Regression with Python

1. Simple Linear Regression

What is Linear Regression Finding the Parameters Building Linear Regression Using NumPy Building Linear Regression Using Statsmodels Challenge: Predicting House Prices

2. Multiple Linear Regression

Linear Regression with Two Features Linear Regression with N Features Building Multiple Linear Regression Choosing the Features Challenge: Predicting Prices Using Two Features

3. Polynomial Regression

Quadratic Regression Polynomial Regression Building Polynomial Regression Interpolation vs Extrapolation Challenge: Evaluating the Model

4. Choosing The Best Model

Metrics Overfitting R-squared Challenge: Predicting Prices Using Polynomial Regression

Building Linear Regression Using NumPy

You already know what simple linear regression is and how to find the line that fits the data best. You will now go through all the steps of building a linear regression for a real dataset.

Loading Data

We have a file, simple_height_data.csv, with the data from our examples. We'll load the file and take a look at it:


              123456
            
import pandas as pd

file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv'
df = pd.read_csv(file_link)	# Read the file

print(df.head())	# Print the first 5 instances from a dataset

So the dataset has two columns: the first is 'Father', which is the input feature, and the second is 'Height', which is our target variable.

We'll assign our target values to the y variable and feature values to X and build a scatterplot.


              12345678910
            
import pandas as pd
import matplotlib.pyplot as plt

file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv'
df = pd.read_csv(file_link)	# Read the file

X = df['Father']	# Assign the feature
y = df['Height']	# Assign the target
plt.scatter(X,y)	# Build scatterplot
plt.show()

Finding Parameters

Now, NumPy has a nice function to find the parameters of linear regression.

Linear Regression is a Polynomial Regression of degree 1 (we will talk about Polynomial Regression in later sections). That's why we need to put deg=1 to get the parameters for the linear regression.
Here is an example:


              12345678910
            
import pandas as pd
import numpy as np

file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv'
df = pd.read_csv(file_link)	# Read the files
X, y = df['Father'], df['Height']	# Assign the variables

beta_1, beta_0 = np.polyfit(X, y, 1)	# Get the parameters
print('beta_0 is', beta_0)
print('beta_1 is', beta_1)

Note

If you are unfamiliar with the syntax beta_1, beta_0 = np.polyfit(X,y,1), that is called unpacking. If you have an iterator (e.g., list or NumPy array or pandas series) that has two items writing


python

is the same as


python

And since the return of a polyfit() function is a NumPy array with two values, we are allowed to do that.

Making the Predictions

Now we can plot the line and predict new variables using the parameters.


              123456789101112
            
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv'
df = pd.read_csv(file_link)	# Read the file
X, y = df['Father'], df['Height']	# Assign the variables
beta_1, beta_0 = np.polyfit(X, y, 1)	# Get the parameters

plt.scatter(X,y)	# Build a scatter plot
plt.plot(X, beta_0 + beta_1 * X, color='red')	# Plot the line
plt.show()

Now that we have the parameters, we can use the linear regression equation to predict new values.


              1234567891011
            
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv'
df = pd.read_csv(file_link)	# Read the file
X, y = df['Father'], df['Height']	# Assign the variables
beta_1, beta_0 = np.polyfit(X, y, 1)	# Get the parameters

X_new = np.array([65, 70, 75])	# Feature values of new instances
y_pred = beta_0 + beta_1 * X_new	# Predict the target
print('Predicted y: ', y_pred)

So it is pretty easy to get the parameters of the linear regression. But some libraries can also give you some extra information.

¿Todo estuvo claro?

¡Gracias por tus comentarios!

Sección 1. Capítulo 3

Pregunte a AI

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla