ML Introduction with scikit-learn

ML Introduction with scikit-learn

0%

1. Machine Learning Concepts

Learn the Machine Learning concepts and the ML project workflow.

Types of Machine Learning

Machine Learning Workflow

2. Preprocessing Data with Scikit-learn

Preprocessing is probably the most important stage of an ML project. This chapter covers the preprocessing steps needed for almost any dataset.

Scikit-learn Concepts

Getting Familiar with Dataset

Dealing with Missing Values

Challenge: Imputing Missing Values

One-Hot Encoder

Challenge: Encoding Categorical Variables

Why Scale the Data?

StandardScaler, MinMaxScaler, MaxAbsScaler

Challenge: Scaling the Features

Preprocessing Summary

3. Pipelines

A pipeline is a neat way to combine all the preprocessing steps as well as a model. Pipelines make it much easier to train and use a model.

What is Pipeline

ColumnTransformer

Efficient Data Preprocessing with Pipelines

Challenge: Creating a Pipeline

Final Estimator

Challenge: Creating a Complete ML Pipeline

4. Modeling

Modeling is the most fun stage of an ML project. Let's learn to build, fine-tune and evaluate the model!

KNeighborsClassifier

Evaluating a Model. Train-Test split.

Cross-Validation

Evaluate the Model with Cross-Validation

The Flaw of GridSearchCV

Tune Hyperparameters with RandomizedSearchCV

Modeling Summary

Putting It All Together

Contenido del Curso

ML Introduction with scikit-learn

ML Introduction with scikit-learn

1. Machine Learning Concepts

What is ML Types of Machine Learning Training Set Types of Data Machine Learning Workflow

2. Preprocessing Data with Scikit-learn

Scikit-learn Concepts Getting Familiar with Dataset Dealing with Missing Values Challenge: Imputing Missing Values OrdinalEncoder One-Hot Encoder LabelEncoder Challenge: Encoding Categorical Variables Why Scale the Data?StandardScaler, MinMaxScaler, MaxAbsScaler Challenge: Scaling the Features Preprocessing Summary

3. Pipelines

What is Pipeline ColumnTransformer Efficient Data Preprocessing with Pipelines Challenge: Creating a Pipeline Final Estimator Challenge: Creating a Complete ML Pipeline

4. Modeling

Models KNeighborsClassifier Evaluating a Model. Train-Test split.Cross-Validation Evaluate the Model with Cross-Validation GridSearchCV The Flaw of GridSearchCV Tune Hyperparameters with RandomizedSearchCV Modeling Summary Putting It All Together

Preprocessing Summary

That's it for the preprocessing. The three problems we addressed were missing values, categorical values, and unscaled data.
These are the most frequent problems datasets face, so Imputation, Encoding, and Scaling are included in almost every pipeline.
Soon you will learn how to make pipelines in sklearn, making it easy to put everything together.
Now let's revise what transformer we learned:

Imputers (Dealing with missing values)

Imputer	What for
`SimpleImputer(strategy='most_frequent')`	Impute categorical data
`SimpleImputer(strategy='mean'/'median')`	Impute numerical data

Encoders (Dealing with categorical values)

Encoder	What for
`OrdinalEncoder`	Encode ordinal features
`OneHotEncoder`	Encode nominal features
`LabelEncoder`	Encode target

Scalers (Dealing with different scales)

Scaler	What for
`MinMaxScaler`	Scale the features to a [0,1] range
`MaxAbsScaler`	Scale the features to a [-1,1] range
`StandardScaler`	Scale the features so that the mean is 0 and the variance is 1

¿Todo estuvo claro?

¡Gracias por tus comentarios!

Sección 2. Capítulo 12

some-alt