Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
ColumnTransformer | Pipelines
ML Introduction with scikit-learn
course content

Contenido del Curso

ML Introduction with scikit-learn

ML Introduction with scikit-learn

1. Machine Learning Concepts
2. Preprocessing Data with Scikit-learn
3. Pipelines
4. Modeling

bookColumnTransformer

Jumping ahead, when we call the .fit_transform(X) method on the Pipeline object, it will apply each transformer on the whole X.
But that is not the behavior we want.
We do not want to encode already numerical values, or we may want to apply different transformers to different columns (e.g., OrdinalEncoder for ordinal features and OneHotEncoder for nominal).

The ColumnTransformer transformer addresses this problem. It allows us to treat each column separately.
To create a ColumnTransformer, you can use a special function make_column_transformer from the sklearn.compose module.

The function takes as arguments tuples with the transformer and the list of columns to which this transformer should be applied.
Here is an example:

Notice the remainder argument in the end. It specifies what to do with columns not mentioned in a make_column_transformer (here only 'gender' and 'education' are mentioned).
By default, it is set to 'drop', which means they will be dropped.
You need to set the remainder='passthrough' to pass other columns untouched.

For example, we will use an exams.csv file containing nominal columns ('gender', 'race/ethnicity', 'lunch', 'test preparation course').
It also contains an ordinal column, 'parental level of education'.

12345
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/exams.csv') print(df.head())
copy

With the help of ColumnTransformer, we will transform nominal data using OneHotEncoder and ordinal using OrdinalEncoder at one step.

123456789101112131415
import pandas as pd from sklearn.compose import make_column_transformer from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/exams.csv') # Ordered categories of parental level of education for OrdinalEncoder edu_categories = ['high school', 'some high school', 'some college', "associate's degree", "bachelor's degree", "master's degree"] # Making a column transformer ct = make_column_transformer( (OrdinalEncoder(categories=[edu_categories]), ['parental level of education']), (OneHotEncoder(), ['gender', 'race/ethnicity', 'lunch', 'test preparation course']), remainder='passthrough' ) print(ct.fit_transform(df))
copy

As you may have guessed, ColumnTransformer is a transformer, so it has all the methods needed for a transformer (.fit(), .fit_transform(), .transform())

Suppose you have a dataset with features 'education', 'income', 'job'. What will happen with the 'income' column after running the following code? (Notice that the remainder argument is not specified)

Suppose you have a dataset with features 'education', 'income', 'job'. What will happen with the 'income' column after running the following code? (Notice that the remainder argument is not specified)

Selecciona la respuesta correcta

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 3. Capítulo 2
some-alt