Residuals
If we look at the plot that shows the dependence of flavanoids on the number of phenols, it will be obvious that the use of linear regression, in this case, was not entirely correct. Moreover, how do we interpret how good our prediction is?
Some points will lie on our constructed line, and some will lie away from it. We can measure the distance between a point and a line along the y-axis. This distance is called the residual or error. The remainder is the difference between the observed value of the target and the predicted value. The closer the residual is to 0, the better our model performs. Let's calculate the residuals and present them as a chart.
12345678residuals = Y_test - y_test_predicted # Visualize the data ax = plt.gca() ax.set_xlabel('total_phenols') ax.set_ylabel('residuals') plt.scatter(X_test, residuals) plt.show()
Output:
Our residuals formed three almost straight lines. This distribution is a sign that the model is not working. Ideally, the remains should be arranged symmetrically and randomly around the horizontal axis. Still, if the residual graph shows some pattern (linear or non-linear), it means that our model is not the best.
Swipe to start coding
Try to find residuals to our previous challenge:
- [Line #29] Define the variable
y_test_predicted
as predicted data forX_test
. - [Line #30] Assign the difference between variables
Y_test
andy_test_predicted
to theresiduals
. - [Line #31] Print the variable
residuals
.
Lösung
Danke für Ihr Feedback!
single
Fragen Sie AI
Fragen Sie AI
Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen
Zusammenfassen Sie dieses Kapitel
Code in file erklären
Erklären, warum file die Aufgabe nicht löst
Awesome!
Completion rate improved to 4.76
Residuals
Swipe um das Menü anzuzeigen
If we look at the plot that shows the dependence of flavanoids on the number of phenols, it will be obvious that the use of linear regression, in this case, was not entirely correct. Moreover, how do we interpret how good our prediction is?
Some points will lie on our constructed line, and some will lie away from it. We can measure the distance between a point and a line along the y-axis. This distance is called the residual or error. The remainder is the difference between the observed value of the target and the predicted value. The closer the residual is to 0, the better our model performs. Let's calculate the residuals and present them as a chart.
12345678residuals = Y_test - y_test_predicted # Visualize the data ax = plt.gca() ax.set_xlabel('total_phenols') ax.set_ylabel('residuals') plt.scatter(X_test, residuals) plt.show()
Output:
Our residuals formed three almost straight lines. This distribution is a sign that the model is not working. Ideally, the remains should be arranged symmetrically and randomly around the horizontal axis. Still, if the residual graph shows some pattern (linear or non-linear), it means that our model is not the best.
Swipe to start coding
Try to find residuals to our previous challenge:
- [Line #29] Define the variable
y_test_predicted
as predicted data forX_test
. - [Line #30] Assign the difference between variables
Y_test
andy_test_predicted
to theresiduals
. - [Line #31] Print the variable
residuals
.
Lösung
Danke für Ihr Feedback!
Awesome!
Completion rate improved to 4.76single