Impara Evaluating Model Performance | Evaluation, Optimization, and Deployment

Scorri per mostrare il menu

When you evaluate a transformer model on a classification task in NLP, you need clear metrics to measure how well your model is performing. The most common metrics are accuracy, precision, recall, and F1 score.

Accuracy measures the proportion of correct predictions out of all predictions. It is calculated as the number of correct predictions divided by the total number of predictions. This metric gives a straightforward view of performance but can be misleading if your dataset is imbalanced.

Precision is the proportion of true positive predictions out of all positive predictions made by the model. It tells you how many of the items labeled as positive are actually positive.

Recall measures the proportion of true positive predictions out of all actual positive cases in the dataset. It tells you how many of the actual positive cases your model is able to identify.

F1 score is the harmonic mean of precision and recall. It balances the two metrics, providing a single score that accounts for both false positives and false negatives. This is especially useful when your dataset has imbalanced classes, as it avoids misleading conclusions that can arise from accuracy alone.

Definition

The F1 Score is the harmonic mean of precision and recall. It is particularly important for imbalanced datasets because it balances the trade-off between precision and recall, ensuring that neither is ignored when classes are not equally represented.


              1234567891011121314
            
from sklearn.metrics import accuracy_score, f1_score

# Example: true labels and predicted labels for a text classification task
y_true = [0, 1, 1, 0, 1, 0, 0, 1]
y_pred = [0, 1, 0, 0, 1, 0, 1, 1]

# Compute accuracy
accuracy = accuracy_score(y_true, y_pred)
print("Accuracy:", accuracy)

# Compute F1 score (binary classification)
f1 = f1_score(y_true, y_pred)
print("F1 Score:", f1)

Note

When your data has imbalanced classes, always report the F1 score in addition to accuracy. The F1 score gives a more realistic measure of model performance in these cases.

Tutto è chiaro?

Grazie per i tuoi commenti!

Sezione 4. Capitolo 1

Chieda ad AI

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

Sezione 4. Capitolo 1