In fact, our model is quite bad, as we can see that by visualizing the data, no linear regression is observed. Why is it so important to first see the data and not blindly rely on calculations and metrics?

In 1973, the English mathematician *Francis Anscombe*, to illustrate the importance of using graphs for statistical analysis, and the impact of outliers on the properties of the entire data set, composed 4 datasets (it also called ***Anscombe's quartet***). The simple statistical properties of these sets are identical. They have the same mean for each variable, variance, the correlation between x and y, linear regression line, and even R-squared. However, their graphs differ significantly. Each set consists of 11 pairs of numbers.

In the first case, we see a situation that is quite familiar to us, where this model can be applied. On the rest of the graphs, we see that the outliers greatly spoil the more probable and obvious location of the straight line, or the linear regression model is completely inapplicable to the dataset since the points go along a parabola.

Regression is one of the machine learning algorithms. In this course you will learn how to work with data, build the model and make predictions about the future values. 

Basics of the linear regression concept using Python.

What the correlation is and how it works in data science.

Deepening into programming, learn how to build and train your own linear regression model.

To understand how good the model is, it needs to be evaluated. In this section, we will consider the metrics for evaluating the built models and figuring out how to work with them in Python.

Here we will learn the technique to use several variables to predict the future outcome.

Not Always the Linear Regression?

Solution