Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
How Mathematically Does It Work? | What is the Linear Regression?
Explore the Linear Regression Using Python
course content

Course Content

Explore the Linear Regression Using Python

Explore the Linear Regression Using Python

1. What is the Linear Regression?
2. Correlation
3. Building and Training Model
4. Metrics to Evaluate the Model
5. Multivariate Linear Regression

How Mathematically Does It Work?

We need to draw the line in such a way that it best determines the dependence of our variables. In other words, the points should lie as closely as possible around our straight line. The method for finding a straight line is as follows:

  1. Let's measure how far the points are from a random line.
  2. Minimizing the distance to each constructed line, we find the desired one.

Let's take a closer look at this algorithm. First, what will we consider the distance from a point to our line? It’s the distance between data points and the fitted line calculated by axis y. These residuals are indicated by the dashed violet vertical lines below:

In the second step we are minimizing the distance and finding the line where the sum of squared residuals (SSR) is minimal. This approach is called the method of ordinary least squares.

Why exactly squares? The squares help avoid contraction due to sign difference, and any strong deviations are reflected more materially in the result. Moreover, it is also helpful in mathematical calculations of the minimum of these distances.

The general formula for SSR:

yi is y-axis coordinate of each data point, and f(xi) is the value of the function at each xi point (the y-coordinate of any x point for our line).

Knowing the variable x, we will try to predict the predicate y. For example, in the previous task, we had a dataset of Abyssinians' height and weight. Using the linear regression method, we have an opportunity to know the height of the cat and to predict its weight. If you want to do the opposite, for example, knowing the cat's weight to predict its height, you should work with the distance between data points and the fitted line calculated by axis x.

In most cases, this line will be completely different from the first one.

Task

Continuing the previous task, try to show also the dependence of the weight of Abyssinian cats on their height, knowing that the slope and the intercept are 0.99 and 4.7, respectively.

  1. [Line #2] Import the library matplotlib.pyplot.
  2. [Line #14] Build the line that shows this dependence, setting parameters in on_height function, and analyze if it differs from the first.
  3. [Lines #27-28] Visualize your data using the results of two functions (on_weight and on_height).

Task

Continuing the previous task, try to show also the dependence of the weight of Abyssinian cats on their height, knowing that the slope and the intercept are 0.99 and 4.7, respectively.

  1. [Line #2] Import the library matplotlib.pyplot.
  2. [Line #14] Build the line that shows this dependence, setting parameters in on_height function, and analyze if it differs from the first.
  3. [Lines #27-28] Visualize your data using the results of two functions (on_weight and on_height).

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Section 1. Chapter 2
toggle bottom row

How Mathematically Does It Work?

We need to draw the line in such a way that it best determines the dependence of our variables. In other words, the points should lie as closely as possible around our straight line. The method for finding a straight line is as follows:

  1. Let's measure how far the points are from a random line.
  2. Minimizing the distance to each constructed line, we find the desired one.

Let's take a closer look at this algorithm. First, what will we consider the distance from a point to our line? It’s the distance between data points and the fitted line calculated by axis y. These residuals are indicated by the dashed violet vertical lines below:

In the second step we are minimizing the distance and finding the line where the sum of squared residuals (SSR) is minimal. This approach is called the method of ordinary least squares.

Why exactly squares? The squares help avoid contraction due to sign difference, and any strong deviations are reflected more materially in the result. Moreover, it is also helpful in mathematical calculations of the minimum of these distances.

The general formula for SSR:

yi is y-axis coordinate of each data point, and f(xi) is the value of the function at each xi point (the y-coordinate of any x point for our line).

Knowing the variable x, we will try to predict the predicate y. For example, in the previous task, we had a dataset of Abyssinians' height and weight. Using the linear regression method, we have an opportunity to know the height of the cat and to predict its weight. If you want to do the opposite, for example, knowing the cat's weight to predict its height, you should work with the distance between data points and the fitted line calculated by axis x.

In most cases, this line will be completely different from the first one.

Task

Continuing the previous task, try to show also the dependence of the weight of Abyssinian cats on their height, knowing that the slope and the intercept are 0.99 and 4.7, respectively.

  1. [Line #2] Import the library matplotlib.pyplot.
  2. [Line #14] Build the line that shows this dependence, setting parameters in on_height function, and analyze if it differs from the first.
  3. [Lines #27-28] Visualize your data using the results of two functions (on_weight and on_height).

Task

Continuing the previous task, try to show also the dependence of the weight of Abyssinian cats on their height, knowing that the slope and the intercept are 0.99 and 4.7, respectively.

  1. [Line #2] Import the library matplotlib.pyplot.
  2. [Line #14] Build the line that shows this dependence, setting parameters in on_height function, and analyze if it differs from the first.
  3. [Lines #27-28] Visualize your data using the results of two functions (on_weight and on_height).

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Section 1. Chapter 2
toggle bottom row

How Mathematically Does It Work?

We need to draw the line in such a way that it best determines the dependence of our variables. In other words, the points should lie as closely as possible around our straight line. The method for finding a straight line is as follows:

  1. Let's measure how far the points are from a random line.
  2. Minimizing the distance to each constructed line, we find the desired one.

Let's take a closer look at this algorithm. First, what will we consider the distance from a point to our line? It’s the distance between data points and the fitted line calculated by axis y. These residuals are indicated by the dashed violet vertical lines below:

In the second step we are minimizing the distance and finding the line where the sum of squared residuals (SSR) is minimal. This approach is called the method of ordinary least squares.

Why exactly squares? The squares help avoid contraction due to sign difference, and any strong deviations are reflected more materially in the result. Moreover, it is also helpful in mathematical calculations of the minimum of these distances.

The general formula for SSR:

yi is y-axis coordinate of each data point, and f(xi) is the value of the function at each xi point (the y-coordinate of any x point for our line).

Knowing the variable x, we will try to predict the predicate y. For example, in the previous task, we had a dataset of Abyssinians' height and weight. Using the linear regression method, we have an opportunity to know the height of the cat and to predict its weight. If you want to do the opposite, for example, knowing the cat's weight to predict its height, you should work with the distance between data points and the fitted line calculated by axis x.

In most cases, this line will be completely different from the first one.

Task

Continuing the previous task, try to show also the dependence of the weight of Abyssinian cats on their height, knowing that the slope and the intercept are 0.99 and 4.7, respectively.

  1. [Line #2] Import the library matplotlib.pyplot.
  2. [Line #14] Build the line that shows this dependence, setting parameters in on_height function, and analyze if it differs from the first.
  3. [Lines #27-28] Visualize your data using the results of two functions (on_weight and on_height).

Task

Continuing the previous task, try to show also the dependence of the weight of Abyssinian cats on their height, knowing that the slope and the intercept are 0.99 and 4.7, respectively.

  1. [Line #2] Import the library matplotlib.pyplot.
  2. [Line #14] Build the line that shows this dependence, setting parameters in on_height function, and analyze if it differs from the first.
  3. [Lines #27-28] Visualize your data using the results of two functions (on_weight and on_height).

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

We need to draw the line in such a way that it best determines the dependence of our variables. In other words, the points should lie as closely as possible around our straight line. The method for finding a straight line is as follows:

  1. Let's measure how far the points are from a random line.
  2. Minimizing the distance to each constructed line, we find the desired one.

Let's take a closer look at this algorithm. First, what will we consider the distance from a point to our line? It’s the distance between data points and the fitted line calculated by axis y. These residuals are indicated by the dashed violet vertical lines below:

In the second step we are minimizing the distance and finding the line where the sum of squared residuals (SSR) is minimal. This approach is called the method of ordinary least squares.

Why exactly squares? The squares help avoid contraction due to sign difference, and any strong deviations are reflected more materially in the result. Moreover, it is also helpful in mathematical calculations of the minimum of these distances.

The general formula for SSR:

yi is y-axis coordinate of each data point, and f(xi) is the value of the function at each xi point (the y-coordinate of any x point for our line).

Knowing the variable x, we will try to predict the predicate y. For example, in the previous task, we had a dataset of Abyssinians' height and weight. Using the linear regression method, we have an opportunity to know the height of the cat and to predict its weight. If you want to do the opposite, for example, knowing the cat's weight to predict its height, you should work with the distance between data points and the fitted line calculated by axis x.

In most cases, this line will be completely different from the first one.

Task

Continuing the previous task, try to show also the dependence of the weight of Abyssinian cats on their height, knowing that the slope and the intercept are 0.99 and 4.7, respectively.

  1. [Line #2] Import the library matplotlib.pyplot.
  2. [Line #14] Build the line that shows this dependence, setting parameters in on_height function, and analyze if it differs from the first.
  3. [Lines #27-28] Visualize your data using the results of two functions (on_weight and on_height).

Switch to desktop for real-world practiceContinue from where you are using one of the options below
Section 1. Chapter 2
Switch to desktop for real-world practiceContinue from where you are using one of the options below
We're sorry to hear that something went wrong. What happened?
some-alt