Conteúdo do Curso
Explore the Linear Regression Using Python
Explore the Linear Regression Using Python
R-squared
Analyzing past metrics, it becomes obvious that even if we calculate them, it is not entirely clear how good or bad they are. Here I want to introduce a new metric that will be universal for each task. This metric is R-squared or is also called the coefficient of determination. It is used to test how well the observed results are reproduced by the model. The metric is calculated using the following formula:
Where:
- RSS = the sum of squares of residuals
- TSS = the total sum of squares
Let’s see R-squared for our data:
RSS = (residuals**2).sum() TSS = ((Y_test - Y_test.mean())**2).sum() r2 = 1 -RSS/TSS print(r2)
We can say that 53% of the variability of the dependent output feature can be explained by our model, but the remaining 47% is still not taken into account.
The best possible estimate for the coefficient of determination is 1, which is obtained when the predicted values coincide with the actual values, that is, the residuals are zero and, accordingly, RSS. If R-squared is 0, it means that the model does not explain any of the variations in the response variable around its mean. It happens that R-squared becomes negative, you should not be afraid. This happens when we have a prediction that is not the most successful, that is, the model did not learn very well. When we subtract the truth from these predictions, we will get large deviations and, as a result, a large negative value at the end when subtracting from 1.
We can get R-squared using the .score()
method:
print(model.score(X_test, Y_test))
The higher the R-squared, the better the model fits our data.
You can also use the way from previous chapters:
from sklearn.metrics import r2_score print(r2_score(Y_test, y_test_predicted))
Tarefa
Let’s calculate the R-squared using our data. You should find it using model’s method and also using built metrics function.
- [Line #31] Find the result of applying model’s method to your data and assign the value to the variable
r_squared_model
. - [Line #32] Print
r_squared_model
. - [Line #34] Import
r2_score
for calculating metrics. - [Line #35] Use the method
r2_score()
to find R-squared and assign it to the variabler_squared_model
. - [Line #36] Print
r_squared_model
.
Obrigado pelo seu feedback!
R-squared
Analyzing past metrics, it becomes obvious that even if we calculate them, it is not entirely clear how good or bad they are. Here I want to introduce a new metric that will be universal for each task. This metric is R-squared or is also called the coefficient of determination. It is used to test how well the observed results are reproduced by the model. The metric is calculated using the following formula:
Where:
- RSS = the sum of squares of residuals
- TSS = the total sum of squares
Let’s see R-squared for our data:
RSS = (residuals**2).sum() TSS = ((Y_test - Y_test.mean())**2).sum() r2 = 1 -RSS/TSS print(r2)
We can say that 53% of the variability of the dependent output feature can be explained by our model, but the remaining 47% is still not taken into account.
The best possible estimate for the coefficient of determination is 1, which is obtained when the predicted values coincide with the actual values, that is, the residuals are zero and, accordingly, RSS. If R-squared is 0, it means that the model does not explain any of the variations in the response variable around its mean. It happens that R-squared becomes negative, you should not be afraid. This happens when we have a prediction that is not the most successful, that is, the model did not learn very well. When we subtract the truth from these predictions, we will get large deviations and, as a result, a large negative value at the end when subtracting from 1.
We can get R-squared using the .score()
method:
print(model.score(X_test, Y_test))
The higher the R-squared, the better the model fits our data.
You can also use the way from previous chapters:
from sklearn.metrics import r2_score print(r2_score(Y_test, y_test_predicted))
Tarefa
Let’s calculate the R-squared using our data. You should find it using model’s method and also using built metrics function.
- [Line #31] Find the result of applying model’s method to your data and assign the value to the variable
r_squared_model
. - [Line #32] Print
r_squared_model
. - [Line #34] Import
r2_score
for calculating metrics. - [Line #35] Use the method
r2_score()
to find R-squared and assign it to the variabler_squared_model
. - [Line #36] Print
r_squared_model
.
Obrigado pelo seu feedback!
R-squared
Analyzing past metrics, it becomes obvious that even if we calculate them, it is not entirely clear how good or bad they are. Here I want to introduce a new metric that will be universal for each task. This metric is R-squared or is also called the coefficient of determination. It is used to test how well the observed results are reproduced by the model. The metric is calculated using the following formula:
Where:
- RSS = the sum of squares of residuals
- TSS = the total sum of squares
Let’s see R-squared for our data:
RSS = (residuals**2).sum() TSS = ((Y_test - Y_test.mean())**2).sum() r2 = 1 -RSS/TSS print(r2)
We can say that 53% of the variability of the dependent output feature can be explained by our model, but the remaining 47% is still not taken into account.
The best possible estimate for the coefficient of determination is 1, which is obtained when the predicted values coincide with the actual values, that is, the residuals are zero and, accordingly, RSS. If R-squared is 0, it means that the model does not explain any of the variations in the response variable around its mean. It happens that R-squared becomes negative, you should not be afraid. This happens when we have a prediction that is not the most successful, that is, the model did not learn very well. When we subtract the truth from these predictions, we will get large deviations and, as a result, a large negative value at the end when subtracting from 1.
We can get R-squared using the .score()
method:
print(model.score(X_test, Y_test))
The higher the R-squared, the better the model fits our data.
You can also use the way from previous chapters:
from sklearn.metrics import r2_score print(r2_score(Y_test, y_test_predicted))
Tarefa
Let’s calculate the R-squared using our data. You should find it using model’s method and also using built metrics function.
- [Line #31] Find the result of applying model’s method to your data and assign the value to the variable
r_squared_model
. - [Line #32] Print
r_squared_model
. - [Line #34] Import
r2_score
for calculating metrics. - [Line #35] Use the method
r2_score()
to find R-squared and assign it to the variabler_squared_model
. - [Line #36] Print
r_squared_model
.
Obrigado pelo seu feedback!
Analyzing past metrics, it becomes obvious that even if we calculate them, it is not entirely clear how good or bad they are. Here I want to introduce a new metric that will be universal for each task. This metric is R-squared or is also called the coefficient of determination. It is used to test how well the observed results are reproduced by the model. The metric is calculated using the following formula:
Where:
- RSS = the sum of squares of residuals
- TSS = the total sum of squares
Let’s see R-squared for our data:
RSS = (residuals**2).sum() TSS = ((Y_test - Y_test.mean())**2).sum() r2 = 1 -RSS/TSS print(r2)
We can say that 53% of the variability of the dependent output feature can be explained by our model, but the remaining 47% is still not taken into account.
The best possible estimate for the coefficient of determination is 1, which is obtained when the predicted values coincide with the actual values, that is, the residuals are zero and, accordingly, RSS. If R-squared is 0, it means that the model does not explain any of the variations in the response variable around its mean. It happens that R-squared becomes negative, you should not be afraid. This happens when we have a prediction that is not the most successful, that is, the model did not learn very well. When we subtract the truth from these predictions, we will get large deviations and, as a result, a large negative value at the end when subtracting from 1.
We can get R-squared using the .score()
method:
print(model.score(X_test, Y_test))
The higher the R-squared, the better the model fits our data.
You can also use the way from previous chapters:
from sklearn.metrics import r2_score print(r2_score(Y_test, y_test_predicted))
Tarefa
Let’s calculate the R-squared using our data. You should find it using model’s method and also using built metrics function.
- [Line #31] Find the result of applying model’s method to your data and assign the value to the variable
r_squared_model
. - [Line #32] Print
r_squared_model
. - [Line #34] Import
r2_score
for calculating metrics. - [Line #35] Use the method
r2_score()
to find R-squared and assign it to the variabler_squared_model
. - [Line #36] Print
r_squared_model
.