Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Evaluating Model Performance | Evaluation, Optimization, and Deployment
Practice
Projects
Quizzes & Challenges
Quizer
Challenges
/
Fine-Tuning Transformers

bookEvaluating Model Performance

Sveip for å vise menyen

When you evaluate a transformer model on a classification task in NLP, you need clear metrics to measure how well your model is performing. The most common metrics are accuracy, precision, recall, and F1 score.

Accuracy measures the proportion of correct predictions out of all predictions. It is calculated as the number of correct predictions divided by the total number of predictions. This metric gives a straightforward view of performance but can be misleading if your dataset is imbalanced.

Precision is the proportion of true positive predictions out of all positive predictions made by the model. It tells you how many of the items labeled as positive are actually positive.

Recall measures the proportion of true positive predictions out of all actual positive cases in the dataset. It tells you how many of the actual positive cases your model is able to identify.

F1 score is the harmonic mean of precision and recall. It balances the two metrics, providing a single score that accounts for both false positives and false negatives. This is especially useful when your dataset has imbalanced classes, as it avoids misleading conclusions that can arise from accuracy alone.

Note
Definition

The F1 Score is the harmonic mean of precision and recall. It is particularly important for imbalanced datasets because it balances the trade-off between precision and recall, ensuring that neither is ignored when classes are not equally represented.

1234567891011121314
from sklearn.metrics import accuracy_score, f1_score # Example: true labels and predicted labels for a text classification task y_true = [0, 1, 1, 0, 1, 0, 0, 1] y_pred = [0, 1, 0, 0, 1, 0, 1, 1] # Compute accuracy accuracy = accuracy_score(y_true, y_pred) print("Accuracy:", accuracy) # Compute F1 score (binary classification) f1 = f1_score(y_true, y_pred) print("F1 Score:", f1)
copy
Note
Note

When your data has imbalanced classes, always report the F1 score in addition to accuracy. The F1 score gives a more realistic measure of model performance in these cases.

question mark

When should you prefer using the F1 score over accuracy as your main evaluation metric?

Select the correct answer

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 4. Kapittel 1

Spør AI

expand

Spør AI

ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Seksjon 4. Kapittel 1
some-alt