Course Content
Linear Regression with Python
Linear Regression with Python
Predict House Prices
Let's build a real-world example regression model. We have a file, houses_simple.csv
, that holds information about housing prices with its area as a feature.
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/houses_simple.csv') print(df.head())
Let's assign variables and visualize our dataset!
import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/houses_simple.csv') X = df['square_feet'] y = df['price'] plt.scatter(X, y, alpha=0.5)
In the example with a person's height, it was much easier to imagine a line fitting the data well.
But now our data has much more variance since the target highly depends on many other things like age, location, interior, etc.
Anyway, the task is to build the line that best fits the data we have; it will show the trend. The OLS
class should be used for that. Soon we will learn how to add more features, it will make the prediction better!
Task
- Assign the
'price'
column ofdf
toy
. - Create the
X_tilde
matrix using theadd_constant()
function fromstatsmodels
(imported assm
). - Initialize the
OLS
object and train it. - Preprocess
X_new
array the same way asX
. - Predict the target for
X_new_tilde
matrix.
Once you've completed this task, click the button below the code to check your solution.
Thanks for your feedback!
Predict House Prices
Let's build a real-world example regression model. We have a file, houses_simple.csv
, that holds information about housing prices with its area as a feature.
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/houses_simple.csv') print(df.head())
Let's assign variables and visualize our dataset!
import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/houses_simple.csv') X = df['square_feet'] y = df['price'] plt.scatter(X, y, alpha=0.5)
In the example with a person's height, it was much easier to imagine a line fitting the data well.
But now our data has much more variance since the target highly depends on many other things like age, location, interior, etc.
Anyway, the task is to build the line that best fits the data we have; it will show the trend. The OLS
class should be used for that. Soon we will learn how to add more features, it will make the prediction better!
Task
- Assign the
'price'
column ofdf
toy
. - Create the
X_tilde
matrix using theadd_constant()
function fromstatsmodels
(imported assm
). - Initialize the
OLS
object and train it. - Preprocess
X_new
array the same way asX
. - Predict the target for
X_new_tilde
matrix.
Once you've completed this task, click the button below the code to check your solution.
Thanks for your feedback!
Predict House Prices
Let's build a real-world example regression model. We have a file, houses_simple.csv
, that holds information about housing prices with its area as a feature.
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/houses_simple.csv') print(df.head())
Let's assign variables and visualize our dataset!
import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/houses_simple.csv') X = df['square_feet'] y = df['price'] plt.scatter(X, y, alpha=0.5)
In the example with a person's height, it was much easier to imagine a line fitting the data well.
But now our data has much more variance since the target highly depends on many other things like age, location, interior, etc.
Anyway, the task is to build the line that best fits the data we have; it will show the trend. The OLS
class should be used for that. Soon we will learn how to add more features, it will make the prediction better!
Task
- Assign the
'price'
column ofdf
toy
. - Create the
X_tilde
matrix using theadd_constant()
function fromstatsmodels
(imported assm
). - Initialize the
OLS
object and train it. - Preprocess
X_new
array the same way asX
. - Predict the target for
X_new_tilde
matrix.
Once you've completed this task, click the button below the code to check your solution.
Thanks for your feedback!
Let's build a real-world example regression model. We have a file, houses_simple.csv
, that holds information about housing prices with its area as a feature.
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/houses_simple.csv') print(df.head())
Let's assign variables and visualize our dataset!
import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/houses_simple.csv') X = df['square_feet'] y = df['price'] plt.scatter(X, y, alpha=0.5)
In the example with a person's height, it was much easier to imagine a line fitting the data well.
But now our data has much more variance since the target highly depends on many other things like age, location, interior, etc.
Anyway, the task is to build the line that best fits the data we have; it will show the trend. The OLS
class should be used for that. Soon we will learn how to add more features, it will make the prediction better!
Task
- Assign the
'price'
column ofdf
toy
. - Create the
X_tilde
matrix using theadd_constant()
function fromstatsmodels
(imported assm
). - Initialize the
OLS
object and train it. - Preprocess
X_new
array the same way asX
. - Predict the target for
X_new_tilde
matrix.
Once you've completed this task, click the button below the code to check your solution.