Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Evaluating AutoML Results | Applications and Evaluation
/
Introduction to AutoML

bookEvaluating AutoML Results

Stryg for at vise menuen

When you use AutoML tools to build and select machine learning models, you must be able to interpret and compare their performance using evaluation metrics. In classification tasks, some of the most common metrics are accuracy, precision, recall, and F1 score. Each of these metrics highlights a different aspect of model performance, and understanding them helps you make informed decisions about which model to deploy.

Accuracy measures the proportion of correct predictions out of all predictions made. It is calculated as the number of correct predictions divided by the total number of predictions. While accuracy is straightforward, it can be misleading if your dataset is imbalanced, meaning the classes are not represented equally.

Precision focuses on the quality of positive predictions. It is the ratio of true positives (correctly predicted positive samples) to all predicted positives (both correct and incorrect). High precision means that when the model predicts a positive class, it is usually correct.

Recall (also known as sensitivity or true positive rate) measures the ability of the model to find all the relevant cases within a dataset. It is the ratio of true positives to all actual positives. High recall means the model successfully identifies most of the positive samples.

F1 score is the harmonic mean of precision and recall. It provides a single metric that balances both concerns, making it particularly useful when you care equally about precision and recall or when dealing with imbalanced datasets.

1234567891011121314151617181920
from tpot import TPOTClassifier from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score # Generate data and split X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_redundant=5, weights=[0.7, 0.3], random_state=42) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # AutoML with TPOT automl = TPOTClassifier(generations=3, population_size=10, cv=3, verbosity=0, random_state=42) automl.fit(X_train, y_train) y_pred = automl.predict(X_test) # Evaluate print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}") print(f"Precision: {precision_score(y_test, y_pred):.3f}") print(f"Recall: {recall_score(y_test, y_pred):.3f}") print(f"F1 Score: {f1_score(y_test, y_pred):.3f}")
copy
Note
Note

Always use the same metric when comparing different models or AutoML runs to ensure a fair and meaningful comparison.

question mark

Which metric is generally most informative when you have a highly imbalanced classification dataset?

Select the correct answer

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 4. Kapitel 1

Spørg AI

expand

Spørg AI

ChatGPT

Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat

Sektion 4. Kapitel 1
some-alt