Lære Evaluating AutoML Results | Applications and Evaluation

Stryg for at vise menuen

When you use AutoML tools to build and select machine learning models, you must be able to interpret and compare their performance using evaluation metrics. In classification tasks, some of the most common metrics are accuracy, precision, recall, and F1 score. Each of these metrics highlights a different aspect of model performance, and understanding them helps you make informed decisions about which model to deploy.

Accuracy measures the proportion of correct predictions out of all predictions made. It is calculated as the number of correct predictions divided by the total number of predictions. While accuracy is straightforward, it can be misleading if your dataset is imbalanced, meaning the classes are not represented equally.

Precision focuses on the quality of positive predictions. It is the ratio of true positives (correctly predicted positive samples) to all predicted positives (both correct and incorrect). High precision means that when the model predicts a positive class, it is usually correct.

Recall (also known as sensitivity or true positive rate) measures the ability of the model to find all the relevant cases within a dataset. It is the ratio of true positives to all actual positives. High recall means the model successfully identifies most of the positive samples.

F1 score is the harmonic mean of precision and recall. It provides a single metric that balances both concerns, making it particularly useful when you care equally about precision and recall or when dealing with imbalanced datasets.


              1234567891011121314151617181920
            
from tpot import TPOTClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Generate data and split
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, weights=[0.7, 0.3], random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# AutoML with TPOT
automl = TPOTClassifier(generations=3, population_size=10, cv=3, verbosity=0, random_state=42)
automl.fit(X_train, y_train)
y_pred = automl.predict(X_test)

# Evaluate
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Precision: {precision_score(y_test, y_pred):.3f}")
print(f"Recall: {recall_score(y_test, y_pred):.3f}")
print(f"F1 Score: {f1_score(y_test, y_pred):.3f}")

Note

Always use the same metric when comparing different models or AutoML runs to ensure a fair and meaningful comparison.

Var alt klart?

Tak for dine kommentarer!

Sektion 4. Kapitel 1

Spørg AI

Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat

Sektion 4. Kapitel 1