Course Content
ML Introduction with scikit-learn
ML Introduction with scikit-learn
What is Pipeline
In the previous section, we completed three preprocessing steps, Imputing, Encoding and Scaling.
We did it step by step, transforming the needed columns and collecting them back to the X
array.
It is a tedious process, especially when there is an OneHotEncoder
that changes the number of columns.
Another problem with it is that to make a prediction, new instances should go through the same preprocessing steps, so we would need to perform all those transformations again.
Luckily, Scikit-learn provides a Pipeline
class – a simple way to collect all those transformations together, so it is easier to transform both training data and new instances.
Pipeline
is a container for all the transformers (and the final estimator, as you will see later).
By calling the .fit_transform()
method of a Pipeline
object, it will sequentially call each transformer's .fit_transform()
.
This way, you only need to call .fit_transform()
once to transform a training set and then the .transform()
method to transform new instances.
Thanks for your feedback!