Models
The fundamentals of data preprocessing and pipeline construction are now covered. The next step is modeling.
A model in Scikit-learn is an estimator that provides .predict()
and .score()
methods, along with .fit()
inherited from all estimators.
.fit()
Once the data is preprocessed and ready to go to the model, the first step of building a model is training a model. This is done using the .fit(X, y)
.
To train a model performing a supervised learning task (e.g., regression, classification), you need to pass both X
and y
to the .fit()
method.
If you are dealing with an unsupervised learning task (e.g., clustering), it does not require labeled data, so you can only pass the X
variable, .fit(X)
. However, using .fit(X, y)
will not raise an error. The model will just ignore the y
variable.
During training, a model learns everything it needs to make predictions. What the model learns and the duration of training depend on the chosen algorithm. For each task, numerous models are available, based on different algorithms. Some train slower, while others train faster.
However, training is generally the most time-consuming aspect of machine learning. If the training set is large, a model could take minutes, hours, or even days to train.
.predict()
Once the model is trained using the .fit()
method, it can perform predictions. Predicting is as easy as calling the .predict()
method:
model.fit(X, y) # Train a model
y_pred = model.predict(X_new) # Get a prediction
Usually, you want to predict a target for new instances, X_new
.
.score()
The .score()
method is used to measure a trained model's performance. Usually, it is calculated on the test set (the following chapters will explain what it is). Here is the syntax:
model.fit(X, y) # Training the model
model.score(X_test, y_test)
The .score()
method requires actual target values (y_test
in the example). It calculates the prediction for X_test
instances and compares this prediction with the true target (y_test
) using some metric. By default, this metric is accuracy for classification.
X_test
refers to the subset of the dataset, known as the test set, used to evaluate a model's performance after training. It contains the features (input data). y_test
is the corresponding subset of true labels for X_test
. Together, they assess how well the model predicts new, unseen data.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Can you explain more about what an estimator is in Scikit-learn?
What are some common algorithms used for modeling in Scikit-learn?
How does the .score() method differ for regression and classification tasks?
Awesome!
Completion rate improved to 3.13
Models
Swipe to show menu
The fundamentals of data preprocessing and pipeline construction are now covered. The next step is modeling.
A model in Scikit-learn is an estimator that provides .predict()
and .score()
methods, along with .fit()
inherited from all estimators.
.fit()
Once the data is preprocessed and ready to go to the model, the first step of building a model is training a model. This is done using the .fit(X, y)
.
To train a model performing a supervised learning task (e.g., regression, classification), you need to pass both X
and y
to the .fit()
method.
If you are dealing with an unsupervised learning task (e.g., clustering), it does not require labeled data, so you can only pass the X
variable, .fit(X)
. However, using .fit(X, y)
will not raise an error. The model will just ignore the y
variable.
During training, a model learns everything it needs to make predictions. What the model learns and the duration of training depend on the chosen algorithm. For each task, numerous models are available, based on different algorithms. Some train slower, while others train faster.
However, training is generally the most time-consuming aspect of machine learning. If the training set is large, a model could take minutes, hours, or even days to train.
.predict()
Once the model is trained using the .fit()
method, it can perform predictions. Predicting is as easy as calling the .predict()
method:
model.fit(X, y) # Train a model
y_pred = model.predict(X_new) # Get a prediction
Usually, you want to predict a target for new instances, X_new
.
.score()
The .score()
method is used to measure a trained model's performance. Usually, it is calculated on the test set (the following chapters will explain what it is). Here is the syntax:
model.fit(X, y) # Training the model
model.score(X_test, y_test)
The .score()
method requires actual target values (y_test
in the example). It calculates the prediction for X_test
instances and compares this prediction with the true target (y_test
) using some metric. By default, this metric is accuracy for classification.
X_test
refers to the subset of the dataset, known as the test set, used to evaluate a model's performance after training. It contains the features (input data). y_test
is the corresponding subset of true labels for X_test
. Together, they assess how well the model predicts new, unseen data.
Thanks for your feedback!