ML Introduction with scikit-learn

Contenu du cours

ML Introduction with scikit-learn

ML Introduction with scikit-learn

1. Machine Learning Concepts

What is ML Types of Machine Learning Training Set Types of Data Machine Learning Workflow

2. Preprocessing Data with Scikit-learn

Scikit-learn Concepts Getting Familiar with Dataset Dealing with Missing Values Challenge: Imputing Missing Values OrdinalEncoder One-Hot Encoder LabelEncoder Challenge: Encoding Categorical Variables Why Scale the Data?StandardScaler, MinMaxScaler, MaxAbsScaler Challenge: Scaling the Features

3. Pipelines

What is Pipeline ColumnTransformer Efficient Data Preprocessing with Pipelines Challenge: Creating a Pipeline Final Estimator Challenge: Creating a Complete ML Pipeline

4. Modeling

Models KNeighborsClassifier Evaluating the Model Cross-Validation Challenge: Evaluating the Model with Cross-Validation GridSearchCV The Flaw of GridSearchCV Challenge: Tuning Hyperparameters with RandomizedSearchCV Modeling Summary Challenge: Putting It All Together

LabelEncoder

The OrdinalEncoder and OneHotEncoder are typically used to encode features (the X variable). However, the target variable (y) can also be categorical.


              123456789
            
import pandas as pd

# Load the data and assign X, y variables
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/adult_edu.csv')
y = df['income'] # Income is a target in this dataset
X = df.drop('income', axis=1)

print(y)
print('All values: ', y.unique())

The LabelEncoder is used to encode the target, regardless of whether it is nominal or ordinal.

ML models do not consider the order of the target, allowing it to be encoded as any numerical values. LabelEncoder encodes the target to numbers 0, 1, ... .


              1234567891011121314
            
import pandas as pd
from sklearn.preprocessing import LabelEncoder

# Load the data and assign X, y variables
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/adult_edu.csv')
y = df['income'] # Income is a target in this dataset
X = df.drop('income', axis=1)
# Initialize a LabelEncoder object and encode the y variable
label_enc = LabelEncoder()
y = label_enc.fit_transform(y)
print(y)
# Decode the y variable back
y_decoded = label_enc.inverse_transform(y)
print(y_decoded)

The code above encodes the target using LabelEncoder and then uses the .inverse_transform() method to convert it back to the original representation.

Tout était clair ?

Merci pour vos commentaires !

Section 2. Chapitre 7

Demandez à l'IA

Demandez à l'IA

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

Contenu du cours

ML Introduction with scikit-learn

ML Introduction with scikit-learn

1. Machine Learning Concepts

What is ML Types of Machine Learning Training Set Types of Data Machine Learning Workflow

2. Preprocessing Data with Scikit-learn

Scikit-learn Concepts Getting Familiar with Dataset Dealing with Missing Values Challenge: Imputing Missing Values OrdinalEncoder One-Hot Encoder LabelEncoder Challenge: Encoding Categorical Variables Why Scale the Data?StandardScaler, MinMaxScaler, MaxAbsScaler Challenge: Scaling the Features

3. Pipelines

What is Pipeline ColumnTransformer Efficient Data Preprocessing with Pipelines Challenge: Creating a Pipeline Final Estimator Challenge: Creating a Complete ML Pipeline

4. Modeling

Models KNeighborsClassifier Evaluating the Model Cross-Validation Challenge: Evaluating the Model with Cross-Validation GridSearchCV The Flaw of GridSearchCV Challenge: Tuning Hyperparameters with RandomizedSearchCV Modeling Summary Challenge: Putting It All Together

LabelEncoder

The OrdinalEncoder and OneHotEncoder are typically used to encode features (the X variable). However, the target variable (y) can also be categorical.


              123456789
            
import pandas as pd

# Load the data and assign X, y variables
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/adult_edu.csv')
y = df['income'] # Income is a target in this dataset
X = df.drop('income', axis=1)

print(y)
print('All values: ', y.unique())

The LabelEncoder is used to encode the target, regardless of whether it is nominal or ordinal.

ML models do not consider the order of the target, allowing it to be encoded as any numerical values. LabelEncoder encodes the target to numbers 0, 1, ... .


              1234567891011121314
            
import pandas as pd
from sklearn.preprocessing import LabelEncoder

# Load the data and assign X, y variables
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/adult_edu.csv')
y = df['income'] # Income is a target in this dataset
X = df.drop('income', axis=1)
# Initialize a LabelEncoder object and encode the y variable
label_enc = LabelEncoder()
y = label_enc.fit_transform(y)
print(y)
# Decode the y variable back
y_decoded = label_enc.inverse_transform(y)
print(y_decoded)

The code above encodes the target using LabelEncoder and then uses the .inverse_transform() method to convert it back to the original representation.

Tout était clair ?

Merci pour vos commentaires !

Section 2. Chapitre 7

some-alt