Course Content
ML Introduction with scikit-learn
ML Introduction with scikit-learn
Scikit-learn Concepts
The Scikit-learn (imported as sklearn
) library offers various functions and classes for preprocessing data and modeling.
The main sklearn
objects are estimator, transformer, predictor, and model.
Let's examine each of those.
Estimator
Each sklearn
's class with the .fit()
method is considered an estimator.
The .fit()
method allows an object to learn from the data.
In other words, the .fit()
method is for training an object.
It takes X
and y
parameters (y
is optional for unsupervised learning tasks). The syntax:
As you can tell, that's not really helpful if an object only learns from data and does nothing with it. But the two objects, the transformer and predictor, that inherit from the estimator are much more useful. Let's start with the transformer.
Transformer
A transformer has the .fit()
method and the .transform()
method that transforms the data in some way.
Usually, transformers need to learn something from data before transforming it, so you need to apply .fit()
and then .transform()
. To avoid that, transformers also have the .fit_transform()
method..fit_transform()
leads to the same result as applying .fit()
and .transform()
sequentially, but is sometimes faster, so it is preferable over .fit().transform()
.
The syntax is:
Note
Transformers are usually used to transform the
X
array. However, as we will see in the example ofLabelEncoder
, some transformers are made for they
array.
Predictor
A predictor is an estimator(has the .fit()
method) that has the .predict()
method.
The .predict()
method is used for making predictions. The syntax:
Model
A model is a predictor that also has the .score()
method.
The .score()
method calculates a score (metric) to measure the predictor's performance. The syntax is:
Here is a table to sum up:
Type of class | Required methods |
Estimator | .fit() |
Transformer | .fit() , .transform() , .fit_transform() |
Predictor | .fit() , .predict() |
Model | .fit() , .predict() , .score() |
The preprocessing stage involves working with Transformers, and we work with Predictors (more specifically with Models) at the modeling stage.
Thanks for your feedback!