Lære Evaluating Model Performance | Evaluation, Optimization, and Deployment

Sveip for å vise menyen

When you evaluate a transformer model on a classification task in NLP, you need clear metrics to measure how well your model is performing. The most common metrics are accuracy, precision, recall, and F1 score.

Accuracy measures the proportion of correct predictions out of all predictions. It is calculated as the number of correct predictions divided by the total number of predictions. This metric gives a straightforward view of performance but can be misleading if your dataset is imbalanced.

Precision is the proportion of true positive predictions out of all positive predictions made by the model. It tells you how many of the items labeled as positive are actually positive.

Recall measures the proportion of true positive predictions out of all actual positive cases in the dataset. It tells you how many of the actual positive cases your model is able to identify.

F1 score is the harmonic mean of precision and recall. It balances the two metrics, providing a single score that accounts for both false positives and false negatives. This is especially useful when your dataset has imbalanced classes, as it avoids misleading conclusions that can arise from accuracy alone.

Definition

The F1 Score is the harmonic mean of precision and recall. It is particularly important for imbalanced datasets because it balances the trade-off between precision and recall, ensuring that neither is ignored when classes are not equally represented.


              1234567891011121314
            
from sklearn.metrics import accuracy_score, f1_score

# Example: true labels and predicted labels for a text classification task
y_true = [0, 1, 1, 0, 1, 0, 0, 1]
y_pred = [0, 1, 0, 0, 1, 0, 1, 1]

# Compute accuracy
accuracy = accuracy_score(y_true, y_pred)
print("Accuracy:", accuracy)

# Compute F1 score (binary classification)
f1 = f1_score(y_true, y_pred)
print("F1 Score:", f1)

Note

When your data has imbalanced classes, always report the F1 score in addition to accuracy. The F1 score gives a more realistic measure of model performance in these cases.

Alt var klart?

Takk for tilbakemeldingene dine!

Seksjon 4. Kapittel 1

Spør AI

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Seksjon 4. Kapittel 1