Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Not Always the Linear Regression? | Metrics to Evaluate the Model
Explore the Linear Regression Using Python
course content

Course Content

Explore the Linear Regression Using Python

Explore the Linear Regression Using Python

1. What is the Linear Regression?
2. Correlation
3. Building and Training Model
4. Metrics to Evaluate the Model
5. Multivariate Linear Regression

Not Always the Linear Regression?

In fact, our model is quite bad, as we can see that by visualizing the data, no linear regression is observed. Why is it so important to first see the data and not blindly rely on calculations and metrics?

In 1973, the English mathematician Francis Anscombe, to illustrate the importance of using graphs for statistical analysis, and the impact of outliers on the properties of the entire data set, composed 4 datasets (it also called <strong>Anscombe's quartet</strong>). The simple statistical properties of these sets are identical. They have the same mean for each variable, variance, the correlation between x and y, linear regression line, and even R-squared. However, their graphs differ significantly. Each set consists of 11 pairs of numbers.

In the first case, we see a situation that is quite familiar to us, where this model can be applied. On the rest of the graphs, we see that the outliers greatly spoil the more probable and obvious location of the straight line, or the linear regression model is completely inapplicable to the dataset since the points go along a parabola.

Task

The data in the arrays displayed on the graphs above. We will find in this task the linear regression parameters. Let’s see if the statistical properties of these sets are identical:

  1. [Line #2] Import the library scipy.
  2. [Line #19] Find the slope, the intercept and the Pearson r using linear regression.
  3. [Line #20] Print in the function the slope, the intercept and Pearson r, round the result to the second digit.
  4. [Lines #23-26] Apply function to each set of x and y arrays.

Task

The data in the arrays displayed on the graphs above. We will find in this task the linear regression parameters. Let’s see if the statistical properties of these sets are identical:

  1. [Line #2] Import the library scipy.
  2. [Line #19] Find the slope, the intercept and the Pearson r using linear regression.
  3. [Line #20] Print in the function the slope, the intercept and Pearson r, round the result to the second digit.
  4. [Lines #23-26] Apply function to each set of x and y arrays.

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Section 4. Chapter 5
toggle bottom row

Not Always the Linear Regression?

In fact, our model is quite bad, as we can see that by visualizing the data, no linear regression is observed. Why is it so important to first see the data and not blindly rely on calculations and metrics?

In 1973, the English mathematician Francis Anscombe, to illustrate the importance of using graphs for statistical analysis, and the impact of outliers on the properties of the entire data set, composed 4 datasets (it also called <strong>Anscombe's quartet</strong>). The simple statistical properties of these sets are identical. They have the same mean for each variable, variance, the correlation between x and y, linear regression line, and even R-squared. However, their graphs differ significantly. Each set consists of 11 pairs of numbers.

In the first case, we see a situation that is quite familiar to us, where this model can be applied. On the rest of the graphs, we see that the outliers greatly spoil the more probable and obvious location of the straight line, or the linear regression model is completely inapplicable to the dataset since the points go along a parabola.

Task

The data in the arrays displayed on the graphs above. We will find in this task the linear regression parameters. Let’s see if the statistical properties of these sets are identical:

  1. [Line #2] Import the library scipy.
  2. [Line #19] Find the slope, the intercept and the Pearson r using linear regression.
  3. [Line #20] Print in the function the slope, the intercept and Pearson r, round the result to the second digit.
  4. [Lines #23-26] Apply function to each set of x and y arrays.

Task

The data in the arrays displayed on the graphs above. We will find in this task the linear regression parameters. Let’s see if the statistical properties of these sets are identical:

  1. [Line #2] Import the library scipy.
  2. [Line #19] Find the slope, the intercept and the Pearson r using linear regression.
  3. [Line #20] Print in the function the slope, the intercept and Pearson r, round the result to the second digit.
  4. [Lines #23-26] Apply function to each set of x and y arrays.

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Section 4. Chapter 5
toggle bottom row

Not Always the Linear Regression?

In fact, our model is quite bad, as we can see that by visualizing the data, no linear regression is observed. Why is it so important to first see the data and not blindly rely on calculations and metrics?

In 1973, the English mathematician Francis Anscombe, to illustrate the importance of using graphs for statistical analysis, and the impact of outliers on the properties of the entire data set, composed 4 datasets (it also called <strong>Anscombe's quartet</strong>). The simple statistical properties of these sets are identical. They have the same mean for each variable, variance, the correlation between x and y, linear regression line, and even R-squared. However, their graphs differ significantly. Each set consists of 11 pairs of numbers.

In the first case, we see a situation that is quite familiar to us, where this model can be applied. On the rest of the graphs, we see that the outliers greatly spoil the more probable and obvious location of the straight line, or the linear regression model is completely inapplicable to the dataset since the points go along a parabola.

Task

The data in the arrays displayed on the graphs above. We will find in this task the linear regression parameters. Let’s see if the statistical properties of these sets are identical:

  1. [Line #2] Import the library scipy.
  2. [Line #19] Find the slope, the intercept and the Pearson r using linear regression.
  3. [Line #20] Print in the function the slope, the intercept and Pearson r, round the result to the second digit.
  4. [Lines #23-26] Apply function to each set of x and y arrays.

Task

The data in the arrays displayed on the graphs above. We will find in this task the linear regression parameters. Let’s see if the statistical properties of these sets are identical:

  1. [Line #2] Import the library scipy.
  2. [Line #19] Find the slope, the intercept and the Pearson r using linear regression.
  3. [Line #20] Print in the function the slope, the intercept and Pearson r, round the result to the second digit.
  4. [Lines #23-26] Apply function to each set of x and y arrays.

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

In fact, our model is quite bad, as we can see that by visualizing the data, no linear regression is observed. Why is it so important to first see the data and not blindly rely on calculations and metrics?

In 1973, the English mathematician Francis Anscombe, to illustrate the importance of using graphs for statistical analysis, and the impact of outliers on the properties of the entire data set, composed 4 datasets (it also called <strong>Anscombe's quartet</strong>). The simple statistical properties of these sets are identical. They have the same mean for each variable, variance, the correlation between x and y, linear regression line, and even R-squared. However, their graphs differ significantly. Each set consists of 11 pairs of numbers.

In the first case, we see a situation that is quite familiar to us, where this model can be applied. On the rest of the graphs, we see that the outliers greatly spoil the more probable and obvious location of the straight line, or the linear regression model is completely inapplicable to the dataset since the points go along a parabola.

Task

The data in the arrays displayed on the graphs above. We will find in this task the linear regression parameters. Let’s see if the statistical properties of these sets are identical:

  1. [Line #2] Import the library scipy.
  2. [Line #19] Find the slope, the intercept and the Pearson r using linear regression.
  3. [Line #20] Print in the function the slope, the intercept and Pearson r, round the result to the second digit.
  4. [Lines #23-26] Apply function to each set of x and y arrays.

Switch to desktop for real-world practiceContinue from where you are using one of the options below
Section 4. Chapter 5
Switch to desktop for real-world practiceContinue from where you are using one of the options below
We're sorry to hear that something went wrong. What happened?
some-alt