Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Preprocessing Summary | Preprocessing Data with Scikit-learn
ML Introduction with scikit-learn
course content

Contenido del Curso

ML Introduction with scikit-learn

ML Introduction with scikit-learn

1. Machine Learning Concepts
2. Preprocessing Data with Scikit-learn
3. Pipelines
4. Modeling

bookPreprocessing Summary

That's it for the preprocessing. The three problems we addressed were missing values, categorical values, and unscaled data.
These are the most frequent problems datasets face, so Imputation, Encoding, and Scaling are included in almost every pipeline.
Soon you will learn how to make pipelines in sklearn, making it easy to put everything together.
Now let's revise what transformer we learned:

Imputers (Dealing with missing values)

ImputerWhat for
SimpleImputer(strategy='most_frequent')Impute categorical data
SimpleImputer(strategy='mean'/'median')Impute numerical data

Encoders (Dealing with categorical values)

EncoderWhat for
OrdinalEncoderEncode ordinal features
OneHotEncoderEncode nominal features
LabelEncoderEncode target

Scalers (Dealing with different scales)

ScalerWhat for
MinMaxScalerScale the features to a [0,1] range
MaxAbsScalerScale the features to a [-1,1] range
StandardScalerScale the features so that the mean is 0 and the variance is 1

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 2. Capítulo 12
some-alt