Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
R-squared | Metrics to Evaluate the Model
Explore the Linear Regression Using Python
course content

Conteúdo do Curso

Explore the Linear Regression Using Python

Explore the Linear Regression Using Python

1. What is the Linear Regression?
2. Correlation
3. Building and Training Model
4. Metrics to Evaluate the Model
5. Multivariate Linear Regression

R-squared

Analyzing past metrics, it becomes obvious that even if we calculate them, it is not entirely clear how good or bad they are. Here I want to introduce a new metric that will be universal for each task. This metric is R-squared or is also called the coefficient of determination. It is used to test how well the observed results are reproduced by the model. The metric is calculated using the following formula:

Where:

  • RSS = the sum of squares of residuals
  • TSS = the total sum of squares

Let’s see R-squared for our data:

1234
RSS = (residuals**2).sum() TSS = ((Y_test - Y_test.mean())**2).sum() r2 = 1 -RSS/TSS print(r2)
copy

We can say that 53% of the variability of the dependent output feature can be explained by our model, but the remaining 47% is still not taken into account.

The best possible estimate for the coefficient of determination is 1, which is obtained when the predicted values coincide with the actual values, that is, the residuals are zero and, accordingly, RSS. If R-squared is 0, it means that the model does not explain any of the variations in the response variable around its mean. It happens that R-squared becomes negative, you should not be afraid. This happens when we have a prediction that is not the most successful, that is, the model did not learn very well. When we subtract the truth from these predictions, we will get large deviations and, as a result, a large negative value at the end when subtracting from 1.

We can get R-squared using the .score() method:

1
print(model.score(X_test, Y_test))
copy

The higher the R-squared, the better the model fits our data.

You can also use the way from previous chapters:

12
from sklearn.metrics import r2_score print(r2_score(Y_test, y_test_predicted))
copy

Tarefa

Let’s calculate the R-squared using our data. You should find it using model’s method and also using built metrics function.

  1. [Line #31] Find the result of applying model’s method to your data and assign the value to the variable r_squared_model.
  2. [Line #32] Print r_squared_model.
  3. [Line #34] Import r2_score for calculating metrics.
  4. [Line #35] Use the method r2_score() to find R-squared and assign it to the variable r_squared_model.
  5. [Line #36] Print r_squared_model.

Tarefa

Let’s calculate the R-squared using our data. You should find it using model’s method and also using built metrics function.

  1. [Line #31] Find the result of applying model’s method to your data and assign the value to the variable r_squared_model.
  2. [Line #32] Print r_squared_model.
  3. [Line #34] Import r2_score for calculating metrics.
  4. [Line #35] Use the method r2_score() to find R-squared and assign it to the variable r_squared_model.
  5. [Line #36] Print r_squared_model.

Mude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo

Tudo estava claro?

Seção 4. Capítulo 4
toggle bottom row

R-squared

Analyzing past metrics, it becomes obvious that even if we calculate them, it is not entirely clear how good or bad they are. Here I want to introduce a new metric that will be universal for each task. This metric is R-squared or is also called the coefficient of determination. It is used to test how well the observed results are reproduced by the model. The metric is calculated using the following formula:

Where:

  • RSS = the sum of squares of residuals
  • TSS = the total sum of squares

Let’s see R-squared for our data:

1234
RSS = (residuals**2).sum() TSS = ((Y_test - Y_test.mean())**2).sum() r2 = 1 -RSS/TSS print(r2)
copy

We can say that 53% of the variability of the dependent output feature can be explained by our model, but the remaining 47% is still not taken into account.

The best possible estimate for the coefficient of determination is 1, which is obtained when the predicted values coincide with the actual values, that is, the residuals are zero and, accordingly, RSS. If R-squared is 0, it means that the model does not explain any of the variations in the response variable around its mean. It happens that R-squared becomes negative, you should not be afraid. This happens when we have a prediction that is not the most successful, that is, the model did not learn very well. When we subtract the truth from these predictions, we will get large deviations and, as a result, a large negative value at the end when subtracting from 1.

We can get R-squared using the .score() method:

1
print(model.score(X_test, Y_test))
copy

The higher the R-squared, the better the model fits our data.

You can also use the way from previous chapters:

12
from sklearn.metrics import r2_score print(r2_score(Y_test, y_test_predicted))
copy

Tarefa

Let’s calculate the R-squared using our data. You should find it using model’s method and also using built metrics function.

  1. [Line #31] Find the result of applying model’s method to your data and assign the value to the variable r_squared_model.
  2. [Line #32] Print r_squared_model.
  3. [Line #34] Import r2_score for calculating metrics.
  4. [Line #35] Use the method r2_score() to find R-squared and assign it to the variable r_squared_model.
  5. [Line #36] Print r_squared_model.

Tarefa

Let’s calculate the R-squared using our data. You should find it using model’s method and also using built metrics function.

  1. [Line #31] Find the result of applying model’s method to your data and assign the value to the variable r_squared_model.
  2. [Line #32] Print r_squared_model.
  3. [Line #34] Import r2_score for calculating metrics.
  4. [Line #35] Use the method r2_score() to find R-squared and assign it to the variable r_squared_model.
  5. [Line #36] Print r_squared_model.

Mude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo

Tudo estava claro?

Seção 4. Capítulo 4
toggle bottom row

R-squared

Analyzing past metrics, it becomes obvious that even if we calculate them, it is not entirely clear how good or bad they are. Here I want to introduce a new metric that will be universal for each task. This metric is R-squared or is also called the coefficient of determination. It is used to test how well the observed results are reproduced by the model. The metric is calculated using the following formula:

Where:

  • RSS = the sum of squares of residuals
  • TSS = the total sum of squares

Let’s see R-squared for our data:

1234
RSS = (residuals**2).sum() TSS = ((Y_test - Y_test.mean())**2).sum() r2 = 1 -RSS/TSS print(r2)
copy

We can say that 53% of the variability of the dependent output feature can be explained by our model, but the remaining 47% is still not taken into account.

The best possible estimate for the coefficient of determination is 1, which is obtained when the predicted values coincide with the actual values, that is, the residuals are zero and, accordingly, RSS. If R-squared is 0, it means that the model does not explain any of the variations in the response variable around its mean. It happens that R-squared becomes negative, you should not be afraid. This happens when we have a prediction that is not the most successful, that is, the model did not learn very well. When we subtract the truth from these predictions, we will get large deviations and, as a result, a large negative value at the end when subtracting from 1.

We can get R-squared using the .score() method:

1
print(model.score(X_test, Y_test))
copy

The higher the R-squared, the better the model fits our data.

You can also use the way from previous chapters:

12
from sklearn.metrics import r2_score print(r2_score(Y_test, y_test_predicted))
copy

Tarefa

Let’s calculate the R-squared using our data. You should find it using model’s method and also using built metrics function.

  1. [Line #31] Find the result of applying model’s method to your data and assign the value to the variable r_squared_model.
  2. [Line #32] Print r_squared_model.
  3. [Line #34] Import r2_score for calculating metrics.
  4. [Line #35] Use the method r2_score() to find R-squared and assign it to the variable r_squared_model.
  5. [Line #36] Print r_squared_model.

Tarefa

Let’s calculate the R-squared using our data. You should find it using model’s method and also using built metrics function.

  1. [Line #31] Find the result of applying model’s method to your data and assign the value to the variable r_squared_model.
  2. [Line #32] Print r_squared_model.
  3. [Line #34] Import r2_score for calculating metrics.
  4. [Line #35] Use the method r2_score() to find R-squared and assign it to the variable r_squared_model.
  5. [Line #36] Print r_squared_model.

Mude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo

Tudo estava claro?

Analyzing past metrics, it becomes obvious that even if we calculate them, it is not entirely clear how good or bad they are. Here I want to introduce a new metric that will be universal for each task. This metric is R-squared or is also called the coefficient of determination. It is used to test how well the observed results are reproduced by the model. The metric is calculated using the following formula:

Where:

  • RSS = the sum of squares of residuals
  • TSS = the total sum of squares

Let’s see R-squared for our data:

1234
RSS = (residuals**2).sum() TSS = ((Y_test - Y_test.mean())**2).sum() r2 = 1 -RSS/TSS print(r2)
copy

We can say that 53% of the variability of the dependent output feature can be explained by our model, but the remaining 47% is still not taken into account.

The best possible estimate for the coefficient of determination is 1, which is obtained when the predicted values coincide with the actual values, that is, the residuals are zero and, accordingly, RSS. If R-squared is 0, it means that the model does not explain any of the variations in the response variable around its mean. It happens that R-squared becomes negative, you should not be afraid. This happens when we have a prediction that is not the most successful, that is, the model did not learn very well. When we subtract the truth from these predictions, we will get large deviations and, as a result, a large negative value at the end when subtracting from 1.

We can get R-squared using the .score() method:

1
print(model.score(X_test, Y_test))
copy

The higher the R-squared, the better the model fits our data.

You can also use the way from previous chapters:

12
from sklearn.metrics import r2_score print(r2_score(Y_test, y_test_predicted))
copy

Tarefa

Let’s calculate the R-squared using our data. You should find it using model’s method and also using built metrics function.

  1. [Line #31] Find the result of applying model’s method to your data and assign the value to the variable r_squared_model.
  2. [Line #32] Print r_squared_model.
  3. [Line #34] Import r2_score for calculating metrics.
  4. [Line #35] Use the method r2_score() to find R-squared and assign it to the variable r_squared_model.
  5. [Line #36] Print r_squared_model.

Mude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo
Seção 4. Capítulo 4
Mude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo
We're sorry to hear that something went wrong. What happened?
some-alt