Зміст курсу
Classification with Python
Classification with Python
Metrics
Up to this point, we've used accuracy as the main metric to evaluate the model's performance. However, accuracy has some limitations. Let's now discuss its drawbacks and introduce several additional metrics — based on TP, TN, FP, and FN — that help address these issues.
Accuracy
Accuracy represents the proportion of correct predictions:
However, accuracy has its drawbacks. For example, imagine you're trying to predict whether a patient has a rare disease. The dataset contains 99.9% healthy patients and only 0.1% with the disease. In this case, always predicting that the patient is healthy would result in an accuracy of 0.999 —even though such a model is completely useless.
Datasets like this are called imbalanced, and in such cases, balanced accuracy is a better metric to use.
Balanced accuracy
Balanced accuracy calculates the proportion of correct positive predictions and the proportion of correct negative predictions separately, then averages them. This approach gives equal importance to each class, regardless of how imbalanced the dataset is.
In the rare disease example, the balanced accuracy of a model that always predicts "healthy" would be 0.5 — effectively highlighting the issue. So, this problem is addressed.
However, balanced accuracy still doesn't distinguish between type 1 and type 2 errors — just like regular accuracy. That's where precision and recall come in.
Precision
The precision metric indicates how many of the values the model predicted as positive were actually positive. It is the proportion of true positive predictions out of all positive predictions made by the model:
By using the precision metric, we can understand how often a type 1 error occurs. High precision means that type 1 errors are rare, while low precision indicates that type 1 errors happen frequently.
Recall
The recall metric shows the proportion of actual positive cases that the model correctly predicted:
The recall metric helps us understand how often a type 2 error occurs. High recall means type 2 errors are rare, while low recall means they happen frequently.
However, both precision and recall have limitations. For example, a model that predicts only the positive class (1) will achieve perfect recall, but its precision will be poor. On the other hand, a model that correctly predicts just one positive instance and labels everything else as negative will have perfect precision, but terrible recall.
This shows that while it's easy to build a model with perfect precision or perfect recall, it's much harder to build one that performs well on both. That's why it's important to consider both metrics — and fortunately, there’s a metric that combines them.
F1 Score
The F1 score is the harmonic mean of precision and recall. The harmonic mean is preferred over the regular (arithmetic) mean because it penalizes situations where one of the values (either precision or recall) is low, making it a more balanced measure of a model's performance.
The F1 score combines both precision and recall into a single metric. It will only be high if both precision and recall are relatively high, making it a useful measure when you need to balance both types of errors.
Choosing the right metric depends on the specific task. Accuracy (or balanced accuracy for imbalanced datasets) is intuitive and provides a general sense of the model's overall performance. If you need more detailed insight into the types of errors the model makes, precision helps identify type 1 errors, while recall highlights type 2 errors. The F1 score shows how well-balanced the model is in terms of both type 1 and type 2 errors.
Metrics in Python
Scikit-learn provides implementations for all of these metrics in the sklearn.metrics
module:
python
If you want to get metrics like precision, recall, and F1-score all at once, sklearn
provides the classification_report()
function:
import pandas as pd import seaborn as sns from sklearn.metrics import classification_report from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/titanic.csv') X, y = df.drop('Survived', axis=1), df['Survived'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1) random_forest = RandomForestClassifier().fit(X_train, y_train) y_pred = random_forest.predict(X_test) # Display a classification report print(classification_report(y_test, y_pred))
Дякуємо за ваш відгук!