Course Content
Linear Regression for ML
Linear Regression for ML
Finding the Parameters
We now know that Linear Regression is just a line that best fits data. But how can you tell which is the right one?
For that, we can calculate the difference between the predicted value and the actual target value for each data point in the training set.
These differences are called residuals(or errors). And the goal is to make the residuals as small as possible.
Ordinary Least Squares
The default approach is the Ordinary Least Squares(OLS) method - it focuses on minimizing the SSR loss function.
Loss Function
Loss function - is a function that measures how bad the predictions are
(in regression, bad prediction == large residuals).
The most commonly used loss function for regression is an SSR(Sum of squared residuals).
SSR Calculation
To calculate SSR, take each residual, square it (mainly to eliminate the sign of a residual), and sum all of them.
The following video illustrates this process:
And the task of a model is to find the parameters that minimize the SSR.
And one of the solutions to this task is a Normal Equation.
Normal Equation
Fortunately, we do not need to try all the lines and calculate SSR for them. The task of minimizing SSR has a mathematical solution that is not very computationally expensive.
This solution is called the Normal Equation.
Note
This course focuses more on practical aspects of Linear Regression, so we will not derive the Normal Equation. Thus it is completely OK if you don't understand what's happening in the image above.
The key point is that we can find the best parameters β with maths using a Normal Equation. And it is already implemented in many Python libraries!
This chapter may have felt overwhelming, so here are some key points to review.
In the upcoming chapter, we will delve into coding and explore how to utilize Linear Regression in Python with the scikit-learn library.
Quiz
Thanks for your feedback!