Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Scikit-learn Concepts | Preprocessing Data with Scikit-learn
ML Introduction with scikit-learn
course content

Course Content

ML Introduction with scikit-learn

ML Introduction with scikit-learn

1. Machine Learning Concepts
2. Preprocessing Data with Scikit-learn
3. Pipelines
4. Modeling

bookScikit-learn Concepts

The Scikit-learn (imported as sklearn) library offers various functions and classes for preprocessing data and modeling.
The main sklearn objects are estimator, transformer, predictor, and model.
Let's examine each of those.

Estimator

Each sklearn's class with the .fit() method is considered an estimator.
The .fit() method allows an object to learn from the data.
In other words, the .fit() method is for training an object.
It takes X and y parameters (y is optional for unsupervised learning tasks). The syntax:

As you can tell, that's not really helpful if an object only learns from data and does nothing with it. But the two objects, the transformer and predictor, that inherit from the estimator are much more useful. Let's start with the transformer.

Transformer

A transformer has the .fit() method and the .transform() method that transforms the data in some way.
Usually, transformers need to learn something from data before transforming it, so you need to apply .fit() and then .transform(). To avoid that, transformers also have the .fit_transform() method.
.fit_transform() leads to the same result as applying .fit() and .transform() sequentially, but is sometimes faster, so it is preferable over .fit().transform().
The syntax is:

Note

Transformers are usually used to transform the X array. However, as we will see in the example of LabelEncoder, some transformers are made for the y array.

Predictor

A predictor is an estimator(has the .fit() method) that has the .predict() method.
The .predict() method is used for making predictions. The syntax:

Model

A model is a predictor that also has the .score() method.
The .score() method calculates a score (metric) to measure the predictor's performance. The syntax is:

Here is a table to sum up:

Type of classRequired methods
Estimator.fit()
Transformer.fit(), .transform(), .fit_transform()
Predictor.fit(), .predict()
Model.fit(), .predict(), .score()

The preprocessing stage involves working with Transformers, and we work with Predictors (more specifically with Models) at the modeling stage.

Select all correct statements.

Select all correct statements.

Select a few correct answers

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 1
some-alt