Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda ROC Curve and AUC | Classification Metrics
Evaluation Metrics in Machine Learning

bookROC Curve and AUC

To assess how well a binary classifier distinguishes between two classes across all possible thresholds, you use the Receiver Operating Characteristic (ROC) curve. The ROC curve visualizes the trade-off between the true positive rate (TPR, also called sensitivity or recall) and the false positive rate (FPR) as you vary the classification threshold.

  • True Positive Rate (TPR) is the proportion of actual positives correctly identified by the classifier. It is calculated as:

    TPR=TPTP+FN\text{TPR} = \frac{TP}{TP + FN}
  • False Positive Rate (FPR) is the proportion of actual negatives that are incorrectly classified as positive. It is calculated as:

    FPR=FPFP+TN\text{FPR} = \frac{FP}{FP + TN}

By plotting TPR against FPR for every threshold, the ROC curve provides a comprehensive picture of a model’s performance, rather than focusing on a single decision point. The Area Under the Curve (AUC) summarizes this performance: a higher AUC means the model is better at distinguishing between the positive and negative classes across all thresholds.

12345678910111213141516171819202122232425262728293031323334353637383940
import numpy as np from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import roc_curve, roc_auc_score import matplotlib.pyplot as plt # Generate synthetic binary classification data X, y = make_classification( n_samples=1000, n_features=20, n_informative=2, n_redundant=10, n_classes=2, random_state=42 ) # Split data into training and test sets X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=42 ) # Fit a logistic regression classifier clf = LogisticRegression() clf.fit(X_train, y_train) # Get predicted probabilities for the positive class y_scores = clf.predict_proba(X_test)[:, 1] # Compute ROC curve fpr, tpr, thresholds = roc_curve(y_test, y_scores) # Compute AUC auc_score = roc_auc_score(y_test, y_scores) # Plot ROC curve plt.figure(figsize=(8, 6)) plt.plot(fpr, tpr, label=f"ROC curve (AUC = {auc_score:.2f})") plt.plot([0, 1], [0, 1], "k--", label="Random Classifier") plt.xlabel("False Positive Rate") plt.ylabel("True Positive Rate (Recall)") plt.title("ROC Curve") plt.legend(loc="lower right") plt.show()
copy

When you interpret the ROC curve, a curve that bows toward the top left corner indicates a strong classifier, as it achieves high true positive rates with low false positive rates. The AUC quantifies this: an AUC of 0.5 means the classifier performs no better than random guessing, while an AUC of 1.0 indicates perfect discrimination between classes. Generally, an AUC above 0.8 is considered good, while values closer to 1.0 are excellent. However, the context of your problem and the class distribution should always guide your interpretation of ROC and AUC results.

question mark

Which statement best describes the ROC curve and the interpretation of AUC in binary classification?

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 1. Capítulo 4

Pergunte à IA

expand

Pergunte à IA

ChatGPT

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

Suggested prompts:

Can you explain how to interpret the ROC curve and AUC in more detail?

What are some limitations of using ROC curves for model evaluation?

How does class imbalance affect the ROC curve and AUC?

Awesome!

Completion rate improved to 6.25

bookROC Curve and AUC

Deslize para mostrar o menu

To assess how well a binary classifier distinguishes between two classes across all possible thresholds, you use the Receiver Operating Characteristic (ROC) curve. The ROC curve visualizes the trade-off between the true positive rate (TPR, also called sensitivity or recall) and the false positive rate (FPR) as you vary the classification threshold.

  • True Positive Rate (TPR) is the proportion of actual positives correctly identified by the classifier. It is calculated as:

    TPR=TPTP+FN\text{TPR} = \frac{TP}{TP + FN}
  • False Positive Rate (FPR) is the proportion of actual negatives that are incorrectly classified as positive. It is calculated as:

    FPR=FPFP+TN\text{FPR} = \frac{FP}{FP + TN}

By plotting TPR against FPR for every threshold, the ROC curve provides a comprehensive picture of a model’s performance, rather than focusing on a single decision point. The Area Under the Curve (AUC) summarizes this performance: a higher AUC means the model is better at distinguishing between the positive and negative classes across all thresholds.

12345678910111213141516171819202122232425262728293031323334353637383940
import numpy as np from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import roc_curve, roc_auc_score import matplotlib.pyplot as plt # Generate synthetic binary classification data X, y = make_classification( n_samples=1000, n_features=20, n_informative=2, n_redundant=10, n_classes=2, random_state=42 ) # Split data into training and test sets X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=42 ) # Fit a logistic regression classifier clf = LogisticRegression() clf.fit(X_train, y_train) # Get predicted probabilities for the positive class y_scores = clf.predict_proba(X_test)[:, 1] # Compute ROC curve fpr, tpr, thresholds = roc_curve(y_test, y_scores) # Compute AUC auc_score = roc_auc_score(y_test, y_scores) # Plot ROC curve plt.figure(figsize=(8, 6)) plt.plot(fpr, tpr, label=f"ROC curve (AUC = {auc_score:.2f})") plt.plot([0, 1], [0, 1], "k--", label="Random Classifier") plt.xlabel("False Positive Rate") plt.ylabel("True Positive Rate (Recall)") plt.title("ROC Curve") plt.legend(loc="lower right") plt.show()
copy

When you interpret the ROC curve, a curve that bows toward the top left corner indicates a strong classifier, as it achieves high true positive rates with low false positive rates. The AUC quantifies this: an AUC of 0.5 means the classifier performs no better than random guessing, while an AUC of 1.0 indicates perfect discrimination between classes. Generally, an AUC above 0.8 is considered good, while values closer to 1.0 are excellent. However, the context of your problem and the class distribution should always guide your interpretation of ROC and AUC results.

question mark

Which statement best describes the ROC curve and the interpretation of AUC in binary classification?

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 1. Capítulo 4
some-alt