Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Evaluating the Model | Modeling
ML Introduction with scikit-learn

bookEvaluating the Model

When building a predictive model, it is crucial to determine how well it performs before using it for real predictions.

Model evaluation measures how accurately the model makes predictions. The .score() method provides this assessment.

Evaluating performance on the training set is misleading, as the model is likely to perform better on data it has already seen. To obtain a realistic measure, evaluation must be done on unseen data.

In formal terms, the goal is to create a model that generalizes well.

Note
Definition

Generalization is the model's ability to perform effectively on new, unseen data, beyond just the data on which it was trained. It measures how accurately a model's predictions can be applied to real-world scenarios outside the training dataset.

This can be achieved by randomly splitting the dataset into two parts: a training set for fitting the model and a test set for evaluation.

Train the model using the training set and then evaluate it on the test set:

model.fit(X_train, y_train)
print(model.score(X_test, y_test))

To create a random split of the data, use the train_test_split() function from the sklearn.model_selection module.

Typically, the test set size depends on the dataset: 25–40% for small datasets, 10–30% for medium ones, and less than 10% for large datasets.

In this example, with only 342 instances (a small dataset), allocate 33% for the test set:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

Here, X_train and y_train represent the training set, while X_test and y_test represent the test set.

123456789101112131415
import pandas as pd from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import train_test_split df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_pipelined.csv') # Assign X, y variables (X is already preprocessed and y is already encoded) X, y = df.drop('species', axis=1), df['species'] # Train-test split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33) # Initialize and train a model knn5 = KNeighborsClassifier().fit(X_train, y_train) # Trained 5 neighbors model knn1 = KNeighborsClassifier(n_neighbors=1).fit(X_train, y_train) # Trained 1 neighbor model # Print the scores of both models print('5 Neighbors score:',knn5.score(X_test, y_test)) print('1 Neighbor score:',knn1.score(X_test, y_test))
copy

The model is now trained with the training set using .fit(X_train, y_train) and evaluated with the test set using .score(X_test, y_test).

Because train_test_split() splits the data randomly, the train and test sets differ with each run, which leads to varying scores. With a larger dataset, these scores would become more stable.

question mark

To achieve a 67%/33% train-test split, we take one third first rows as the test set and remaining as a training set. Is this statement correct?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 3

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Awesome!

Completion rate improved to 3.13

bookEvaluating the Model

Swipe to show menu

When building a predictive model, it is crucial to determine how well it performs before using it for real predictions.

Model evaluation measures how accurately the model makes predictions. The .score() method provides this assessment.

Evaluating performance on the training set is misleading, as the model is likely to perform better on data it has already seen. To obtain a realistic measure, evaluation must be done on unseen data.

In formal terms, the goal is to create a model that generalizes well.

Note
Definition

Generalization is the model's ability to perform effectively on new, unseen data, beyond just the data on which it was trained. It measures how accurately a model's predictions can be applied to real-world scenarios outside the training dataset.

This can be achieved by randomly splitting the dataset into two parts: a training set for fitting the model and a test set for evaluation.

Train the model using the training set and then evaluate it on the test set:

model.fit(X_train, y_train)
print(model.score(X_test, y_test))

To create a random split of the data, use the train_test_split() function from the sklearn.model_selection module.

Typically, the test set size depends on the dataset: 25–40% for small datasets, 10–30% for medium ones, and less than 10% for large datasets.

In this example, with only 342 instances (a small dataset), allocate 33% for the test set:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

Here, X_train and y_train represent the training set, while X_test and y_test represent the test set.

123456789101112131415
import pandas as pd from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import train_test_split df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_pipelined.csv') # Assign X, y variables (X is already preprocessed and y is already encoded) X, y = df.drop('species', axis=1), df['species'] # Train-test split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33) # Initialize and train a model knn5 = KNeighborsClassifier().fit(X_train, y_train) # Trained 5 neighbors model knn1 = KNeighborsClassifier(n_neighbors=1).fit(X_train, y_train) # Trained 1 neighbor model # Print the scores of both models print('5 Neighbors score:',knn5.score(X_test, y_test)) print('1 Neighbor score:',knn1.score(X_test, y_test))
copy

The model is now trained with the training set using .fit(X_train, y_train) and evaluated with the test set using .score(X_test, y_test).

Because train_test_split() splits the data randomly, the train and test sets differ with each run, which leads to varying scores. With a larger dataset, these scores would become more stable.

question mark

To achieve a 67%/33% train-test split, we take one third first rows as the test set and remaining as a training set. Is this statement correct?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 3
some-alt