Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Efficient Data Preprocessing with Pipelines | Pipelines
ML Introduction with scikit-learn

bookEfficient Data Preprocessing with Pipelines

With the ability to transform columns separately using the make_column_transformer function, the next step is to build pipelines. A pipeline is a container that organizes preprocessing steps and applies them sequentially.

A pipeline in Scikit-learn can be created using either the Pipeline class constructor or the make_pipeline function from the sklearn.pipeline module. This course will focus on make_pipeline, as it is simpler to apply.

You just need to pass all the transformers as arguments to a function. Creating pipelines is that simple.

However, when you call the .fit_transform(X) method on the Pipeline object, it applies .fit_transform(X) to every transformer inside the pipeline, so if you want to treat some columns differently, then you should use a ColumnTransformer and pass it to make_pipeline().

Build a pipeline using the same file as in the previous chapter. The pipeline should include encoders for categorical features along with SimpleImputer. Since the dataset contains both nominal and ordinal features, use a ColumnTransformer to process them separately.

1234567891011121314151617
import pandas as pd from sklearn.compose import make_column_transformer from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder from sklearn.impute import SimpleImputer from sklearn.pipeline import make_pipeline df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/exams.csv') # Making a column transformer edu_categories = ['high school', 'some high school', 'some college', "associate's degree", "bachelor's degree", "master's degree"] ct = make_column_transformer( (OrdinalEncoder(categories=[edu_categories]), ['parental level of education']), (OneHotEncoder(), ['gender', 'race/ethnicity', 'lunch', 'test preparation course']), remainder='passthrough' ) # Making a Pipeline pipe = make_pipeline(ct, SimpleImputer(strategy='most_frequent')) print(pipe.fit_transform(df))
copy
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 3

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Awesome!

Completion rate improved to 3.13

bookEfficient Data Preprocessing with Pipelines

Swipe to show menu

With the ability to transform columns separately using the make_column_transformer function, the next step is to build pipelines. A pipeline is a container that organizes preprocessing steps and applies them sequentially.

A pipeline in Scikit-learn can be created using either the Pipeline class constructor or the make_pipeline function from the sklearn.pipeline module. This course will focus on make_pipeline, as it is simpler to apply.

You just need to pass all the transformers as arguments to a function. Creating pipelines is that simple.

However, when you call the .fit_transform(X) method on the Pipeline object, it applies .fit_transform(X) to every transformer inside the pipeline, so if you want to treat some columns differently, then you should use a ColumnTransformer and pass it to make_pipeline().

Build a pipeline using the same file as in the previous chapter. The pipeline should include encoders for categorical features along with SimpleImputer. Since the dataset contains both nominal and ordinal features, use a ColumnTransformer to process them separately.

1234567891011121314151617
import pandas as pd from sklearn.compose import make_column_transformer from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder from sklearn.impute import SimpleImputer from sklearn.pipeline import make_pipeline df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/exams.csv') # Making a column transformer edu_categories = ['high school', 'some high school', 'some college', "associate's degree", "bachelor's degree", "master's degree"] ct = make_column_transformer( (OrdinalEncoder(categories=[edu_categories]), ['parental level of education']), (OneHotEncoder(), ['gender', 'race/ethnicity', 'lunch', 'test preparation course']), remainder='passthrough' ) # Making a Pipeline pipe = make_pipeline(ct, SimpleImputer(strategy='most_frequent')) print(pipe.fit_transform(df))
copy
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 3
some-alt