Contenido del Curso
ML Introduction with scikit-learn
ML Introduction with scikit-learn
What is Pipeline
In the previous section, we completed three preprocessing steps: imputing, encoding, and scaling.
We did it step by step, transforming the needed columns and collecting them back to the X
array. It is a tedious process, especially when there is an OneHotEncoder
that changes the number of columns.
Another problem with it is that to make a prediction, new instances should go through the same preprocessing steps, so we would need to perform all those transformations again.
Luckily, Scikit-learn provides a Pipeline
class – a simple way to collect all those transformations together, so it is easier to transform both training data and new instances.
A Pipeline
serves as a container for a sequence of transformers, and eventually, an estimator. When you invoke the .fit_transform()
method on a Pipeline
, it sequentially applies the .fit_transform()
method of each transformer to the data.
This streamlined approach means you only need to call .fit_transform()
once on the training set and subsequently use the .transform()
method to process new instances.
¡Gracias por tus comentarios!