What is Pipeline

In the previous section, we completed three preprocessing steps: imputing, encoding, and scaling.

We did it step by step, transforming the needed columns and collecting them back to the X array. It is a tedious process, especially when there is an OneHotEncoder that changes the number of columns.

Another problem with it is that to make a prediction, new instances should go through the same preprocessing steps, so we would need to perform all those transformations again.

Luckily, Scikit-learn provides a Pipeline class – a simple way to collect all those transformations together, so it is easier to transform both training data and new instances.

A Pipeline serves as a container for a sequence of transformers, and eventually, an estimator. When you invoke the .fit_transform() method on a Pipeline, it sequentially applies the .fit_transform() method of each transformer to the data.


python

This streamlined approach means you only need to call .fit_transform() once on the training set and subsequently use the .transform() method to process new instances.

¿Todo estuvo claro?

¡Gracias por tus comentarios!

Sección 3. Capítulo 1

Pregunte a AI

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

Contenido del Curso

ML Introduction with scikit-learn

1. Machine Learning Concepts

What is ML Types of Machine Learning Training Set Types of Data Machine Learning Workflow

2. Preprocessing Data with Scikit-learn

3. Pipelines

What is Pipeline ColumnTransformer Efficient Data Preprocessing with Pipelines Challenge: Creating a Pipeline Final Estimator Challenge: Creating a Complete ML Pipeline

4. Modeling