Course Content
Explore the Linear Regression Using Python
Explore the Linear Regression Using Python
How Mathematically Does It Work?
We need to draw the line in such a way that it best determines the dependence of our variables. In other words, the points should lie as closely as possible around our straight line. The method for finding a straight line is as follows:
- Let's measure how far the points are from a random line.
- Minimizing the distance to each constructed line, we find the desired one.
Let's take a closer look at this algorithm. First, what will we consider the distance from a point to our line? It’s the distance between data points and the fitted line calculated by axis y. These residuals are indicated by the dashed violet vertical lines below:
In the second step we are minimizing the distance and finding the line where the sum of squared residuals (SSR) is minimal. This approach is called the method of ordinary least squares.
Why exactly squares? The squares help avoid contraction due to sign difference, and any strong deviations are reflected more materially in the result. Moreover, it is also helpful in mathematical calculations of the minimum of these distances.
The general formula for SSR:
yi is y-axis coordinate of each data point, and f(xi) is the value of the function at each xi point (the y-coordinate of any x point for our line).
Knowing the variable x, we will try to predict the predicate y. For example, in the previous task, we had a dataset of Abyssinians' height and weight. Using the linear regression method, we have an opportunity to know the height of the cat and to predict its weight. If you want to do the opposite, for example, knowing the cat's weight to predict its height, you should work with the distance between data points and the fitted line calculated by axis x.
In most cases, this line will be completely different from the first one.
Task
Continuing the previous task, try to show also the dependence of the weight of Abyssinian cats on their height, knowing that the slope and the intercept are 0.99
and 4.7
, respectively.
- [Line #2] Import the library
matplotlib.pyplot
. - [Line #14] Build the line that shows this dependence, setting parameters in
on_height function
, and analyze if it differs from the first. - [Lines #27-28] Visualize your data using the results of two functions (
on_weight
andon_height
).
Thanks for your feedback!
How Mathematically Does It Work?
We need to draw the line in such a way that it best determines the dependence of our variables. In other words, the points should lie as closely as possible around our straight line. The method for finding a straight line is as follows:
- Let's measure how far the points are from a random line.
- Minimizing the distance to each constructed line, we find the desired one.
Let's take a closer look at this algorithm. First, what will we consider the distance from a point to our line? It’s the distance between data points and the fitted line calculated by axis y. These residuals are indicated by the dashed violet vertical lines below:
In the second step we are minimizing the distance and finding the line where the sum of squared residuals (SSR) is minimal. This approach is called the method of ordinary least squares.
Why exactly squares? The squares help avoid contraction due to sign difference, and any strong deviations are reflected more materially in the result. Moreover, it is also helpful in mathematical calculations of the minimum of these distances.
The general formula for SSR:
yi is y-axis coordinate of each data point, and f(xi) is the value of the function at each xi point (the y-coordinate of any x point for our line).
Knowing the variable x, we will try to predict the predicate y. For example, in the previous task, we had a dataset of Abyssinians' height and weight. Using the linear regression method, we have an opportunity to know the height of the cat and to predict its weight. If you want to do the opposite, for example, knowing the cat's weight to predict its height, you should work with the distance between data points and the fitted line calculated by axis x.
In most cases, this line will be completely different from the first one.
Task
Continuing the previous task, try to show also the dependence of the weight of Abyssinian cats on their height, knowing that the slope and the intercept are 0.99
and 4.7
, respectively.
- [Line #2] Import the library
matplotlib.pyplot
. - [Line #14] Build the line that shows this dependence, setting parameters in
on_height function
, and analyze if it differs from the first. - [Lines #27-28] Visualize your data using the results of two functions (
on_weight
andon_height
).
Thanks for your feedback!
How Mathematically Does It Work?
We need to draw the line in such a way that it best determines the dependence of our variables. In other words, the points should lie as closely as possible around our straight line. The method for finding a straight line is as follows:
- Let's measure how far the points are from a random line.
- Minimizing the distance to each constructed line, we find the desired one.
Let's take a closer look at this algorithm. First, what will we consider the distance from a point to our line? It’s the distance between data points and the fitted line calculated by axis y. These residuals are indicated by the dashed violet vertical lines below:
In the second step we are minimizing the distance and finding the line where the sum of squared residuals (SSR) is minimal. This approach is called the method of ordinary least squares.
Why exactly squares? The squares help avoid contraction due to sign difference, and any strong deviations are reflected more materially in the result. Moreover, it is also helpful in mathematical calculations of the minimum of these distances.
The general formula for SSR:
yi is y-axis coordinate of each data point, and f(xi) is the value of the function at each xi point (the y-coordinate of any x point for our line).
Knowing the variable x, we will try to predict the predicate y. For example, in the previous task, we had a dataset of Abyssinians' height and weight. Using the linear regression method, we have an opportunity to know the height of the cat and to predict its weight. If you want to do the opposite, for example, knowing the cat's weight to predict its height, you should work with the distance between data points and the fitted line calculated by axis x.
In most cases, this line will be completely different from the first one.
Task
Continuing the previous task, try to show also the dependence of the weight of Abyssinian cats on their height, knowing that the slope and the intercept are 0.99
and 4.7
, respectively.
- [Line #2] Import the library
matplotlib.pyplot
. - [Line #14] Build the line that shows this dependence, setting parameters in
on_height function
, and analyze if it differs from the first. - [Lines #27-28] Visualize your data using the results of two functions (
on_weight
andon_height
).
Thanks for your feedback!
We need to draw the line in such a way that it best determines the dependence of our variables. In other words, the points should lie as closely as possible around our straight line. The method for finding a straight line is as follows:
- Let's measure how far the points are from a random line.
- Minimizing the distance to each constructed line, we find the desired one.
Let's take a closer look at this algorithm. First, what will we consider the distance from a point to our line? It’s the distance between data points and the fitted line calculated by axis y. These residuals are indicated by the dashed violet vertical lines below:
In the second step we are minimizing the distance and finding the line where the sum of squared residuals (SSR) is minimal. This approach is called the method of ordinary least squares.
Why exactly squares? The squares help avoid contraction due to sign difference, and any strong deviations are reflected more materially in the result. Moreover, it is also helpful in mathematical calculations of the minimum of these distances.
The general formula for SSR:
yi is y-axis coordinate of each data point, and f(xi) is the value of the function at each xi point (the y-coordinate of any x point for our line).
Knowing the variable x, we will try to predict the predicate y. For example, in the previous task, we had a dataset of Abyssinians' height and weight. Using the linear regression method, we have an opportunity to know the height of the cat and to predict its weight. If you want to do the opposite, for example, knowing the cat's weight to predict its height, you should work with the distance between data points and the fitted line calculated by axis x.
In most cases, this line will be completely different from the first one.
Task
Continuing the previous task, try to show also the dependence of the weight of Abyssinian cats on their height, knowing that the slope and the intercept are 0.99
and 4.7
, respectively.
- [Line #2] Import the library
matplotlib.pyplot
. - [Line #14] Build the line that shows this dependence, setting parameters in
on_height function
, and analyze if it differs from the first. - [Lines #27-28] Visualize your data using the results of two functions (
on_weight
andon_height
).