ColumnTransformer

Looking ahead, when you invoke the .fit_transform(X) method on a Pipeline object, it applies each transformer to the entire set of features in X. However, this behavior may not always be desired.

For instance, you might not want to encode numerical values or you may need to apply different transformers to specific columns — such as using OrdinalEncoder for ordinal features and OneHotEncoder for nominal features.

The ColumnTransformer resolves this issue by allowing each column to be treated separately. To create a ColumnTransformer, you can utilize the make_column_transformer function from the sklearn.compose module.

The function takes as arguments tuples with the transformer and the list of columns to which this transformer should be applied.

For example, we can create a ColumnTransformer that applies the OrdinalEncoder only to the 'education' column and the OneHotEncoder only to the 'gender' column.


python

For example, we will use an exams.csv file containing nominal columns ('gender', 'race/ethnicity', 'lunch', 'test preparation course'). It also contains an ordinal column, 'parental level of education'.


              12345
            
import pandas as pd

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/exams.csv')

print(df.head())

With the help of ColumnTransformer, we can simultaneously transform nominal data using OneHotEncoder and ordinal data using OrdinalEncoder in a single step.


              123456789101112131415
            
import pandas as pd
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/exams.csv')
# Ordered categories of parental level of education for OrdinalEncoder
edu_categories = ['high school', 'some high school', 'some college', "associate's degree", "bachelor's degree", "master's degree"]
# Making a column transformer
ct = make_column_transformer(
  (OrdinalEncoder(categories=[edu_categories]), ['parental level of education']),
  (OneHotEncoder(), ['gender', 'race/ethnicity', 'lunch', 'test preparation course']), 
  remainder='passthrough'
)

print(ct.fit_transform(df))

"As you might expect, the ColumnTransformer is a transformer, so it includes all the necessary methods for a transformer, such as .fit(), .fit_transform(), and .transform().

Everything was clear?

Thanks for your feedback!

Section 3. Chapter 2

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Course Content

ML Introduction with scikit-learn

1. Machine Learning Concepts

What is ML Types of Machine Learning Training Set Types of Data Machine Learning Workflow

2. Preprocessing Data with Scikit-learn

3. Pipelines

What is Pipeline ColumnTransformer Efficient Data Preprocessing with Pipelines Challenge: Creating a Pipeline Final Estimator Challenge: Creating a Complete ML Pipeline

4. Modeling