Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
One Hot Encoding | Data Encoding
Preprocessing Data
course content

Contenido del Curso

Preprocessing Data

Preprocessing Data

1. Data Exploration
2. Data Cleaning
3. Data Validation
4. Normalization & Standardization
5. Data Encoding

One Hot Encoding

One-hot encoding is one more preprocessing approach that is used before the training process. You already know about the LabelEncoding that transforms like this:

EmbarkedLabel
Q3
S2
S->2
S2
C1

To provide the model to process only 0 and 1 values, one hot encoder transforms to the matrix:

EmbarkedCSQ
Q001
S010
S->010
S010
C100

1 means that the value of embark_town matches the following column name (for example, Queenston matches Q), and 0 - it doesn't match. Instead of saving n values in range 0...n-1, we create n columns filled with 0 and 1.

One hot encoding is quite useful in case if the cell contains multiple values. For example, your dataset contains sentences and a list of eemotions with which the sentence is labeled. It is not c convenient format to work with, so we transform it:

emotionangerjoyloveneutralsad
sad, neutral00011
love00100
love, joy->01100
anger10000
neutral00010

We will use OneHotEncoder to create new features for the categorical columns of our dataset.

OneHotEncoder cannot process NaNs, so you have to preprocess them first.

The common syntax is next:

12345678910
from sklearn.preprocessing import OneHotEncoder # data is loaded already # num_cols and cat_cols are created already encoder = OneHotEncoder() new_data = pd.DataFrame(encoder.fit_transform(data[cat_cols]).toarray()) # join new features to the dataset, but remove categorical features data = data[num_cols].join(new_data)
copy

Tarea

Apply the One Hot Encoding to the dataset.

  1. Load the dataset.
  2. Process the NaNs: drop it for the Embarked, replace with mean value for Age.
  3. Transform the Cabin data as in the previous chapter (apply the Label Encoding).
  4. Create the variable cat_cols to store such a categorical features: Sex, Cabin, and Embarked.
  5. Create OneHotEncoder and store the transformed data to the new_data.
  6. Remove the cat_cols from the dataframe, but add the new_data.
  7. Check the sample.

Tarea

Apply the One Hot Encoding to the dataset.

  1. Load the dataset.
  2. Process the NaNs: drop it for the Embarked, replace with mean value for Age.
  3. Transform the Cabin data as in the previous chapter (apply the Label Encoding).
  4. Create the variable cat_cols to store such a categorical features: Sex, Cabin, and Embarked.
  5. Create OneHotEncoder and store the transformed data to the new_data.
  6. Remove the cat_cols from the dataframe, but add the new_data.
  7. Check the sample.

Cambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones

¿Todo estuvo claro?

Sección 5. Capítulo 2
toggle bottom row

One Hot Encoding

One-hot encoding is one more preprocessing approach that is used before the training process. You already know about the LabelEncoding that transforms like this:

EmbarkedLabel
Q3
S2
S->2
S2
C1

To provide the model to process only 0 and 1 values, one hot encoder transforms to the matrix:

EmbarkedCSQ
Q001
S010
S->010
S010
C100

1 means that the value of embark_town matches the following column name (for example, Queenston matches Q), and 0 - it doesn't match. Instead of saving n values in range 0...n-1, we create n columns filled with 0 and 1.

One hot encoding is quite useful in case if the cell contains multiple values. For example, your dataset contains sentences and a list of eemotions with which the sentence is labeled. It is not c convenient format to work with, so we transform it:

emotionangerjoyloveneutralsad
sad, neutral00011
love00100
love, joy->01100
anger10000
neutral00010

We will use OneHotEncoder to create new features for the categorical columns of our dataset.

OneHotEncoder cannot process NaNs, so you have to preprocess them first.

The common syntax is next:

12345678910
from sklearn.preprocessing import OneHotEncoder # data is loaded already # num_cols and cat_cols are created already encoder = OneHotEncoder() new_data = pd.DataFrame(encoder.fit_transform(data[cat_cols]).toarray()) # join new features to the dataset, but remove categorical features data = data[num_cols].join(new_data)
copy

Tarea

Apply the One Hot Encoding to the dataset.

  1. Load the dataset.
  2. Process the NaNs: drop it for the Embarked, replace with mean value for Age.
  3. Transform the Cabin data as in the previous chapter (apply the Label Encoding).
  4. Create the variable cat_cols to store such a categorical features: Sex, Cabin, and Embarked.
  5. Create OneHotEncoder and store the transformed data to the new_data.
  6. Remove the cat_cols from the dataframe, but add the new_data.
  7. Check the sample.

Tarea

Apply the One Hot Encoding to the dataset.

  1. Load the dataset.
  2. Process the NaNs: drop it for the Embarked, replace with mean value for Age.
  3. Transform the Cabin data as in the previous chapter (apply the Label Encoding).
  4. Create the variable cat_cols to store such a categorical features: Sex, Cabin, and Embarked.
  5. Create OneHotEncoder and store the transformed data to the new_data.
  6. Remove the cat_cols from the dataframe, but add the new_data.
  7. Check the sample.

Cambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones

¿Todo estuvo claro?

Sección 5. Capítulo 2
toggle bottom row

One Hot Encoding

One-hot encoding is one more preprocessing approach that is used before the training process. You already know about the LabelEncoding that transforms like this:

EmbarkedLabel
Q3
S2
S->2
S2
C1

To provide the model to process only 0 and 1 values, one hot encoder transforms to the matrix:

EmbarkedCSQ
Q001
S010
S->010
S010
C100

1 means that the value of embark_town matches the following column name (for example, Queenston matches Q), and 0 - it doesn't match. Instead of saving n values in range 0...n-1, we create n columns filled with 0 and 1.

One hot encoding is quite useful in case if the cell contains multiple values. For example, your dataset contains sentences and a list of eemotions with which the sentence is labeled. It is not c convenient format to work with, so we transform it:

emotionangerjoyloveneutralsad
sad, neutral00011
love00100
love, joy->01100
anger10000
neutral00010

We will use OneHotEncoder to create new features for the categorical columns of our dataset.

OneHotEncoder cannot process NaNs, so you have to preprocess them first.

The common syntax is next:

12345678910
from sklearn.preprocessing import OneHotEncoder # data is loaded already # num_cols and cat_cols are created already encoder = OneHotEncoder() new_data = pd.DataFrame(encoder.fit_transform(data[cat_cols]).toarray()) # join new features to the dataset, but remove categorical features data = data[num_cols].join(new_data)
copy

Tarea

Apply the One Hot Encoding to the dataset.

  1. Load the dataset.
  2. Process the NaNs: drop it for the Embarked, replace with mean value for Age.
  3. Transform the Cabin data as in the previous chapter (apply the Label Encoding).
  4. Create the variable cat_cols to store such a categorical features: Sex, Cabin, and Embarked.
  5. Create OneHotEncoder and store the transformed data to the new_data.
  6. Remove the cat_cols from the dataframe, but add the new_data.
  7. Check the sample.

Tarea

Apply the One Hot Encoding to the dataset.

  1. Load the dataset.
  2. Process the NaNs: drop it for the Embarked, replace with mean value for Age.
  3. Transform the Cabin data as in the previous chapter (apply the Label Encoding).
  4. Create the variable cat_cols to store such a categorical features: Sex, Cabin, and Embarked.
  5. Create OneHotEncoder and store the transformed data to the new_data.
  6. Remove the cat_cols from the dataframe, but add the new_data.
  7. Check the sample.

Cambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones

¿Todo estuvo claro?

One-hot encoding is one more preprocessing approach that is used before the training process. You already know about the LabelEncoding that transforms like this:

EmbarkedLabel
Q3
S2
S->2
S2
C1

To provide the model to process only 0 and 1 values, one hot encoder transforms to the matrix:

EmbarkedCSQ
Q001
S010
S->010
S010
C100

1 means that the value of embark_town matches the following column name (for example, Queenston matches Q), and 0 - it doesn't match. Instead of saving n values in range 0...n-1, we create n columns filled with 0 and 1.

One hot encoding is quite useful in case if the cell contains multiple values. For example, your dataset contains sentences and a list of eemotions with which the sentence is labeled. It is not c convenient format to work with, so we transform it:

emotionangerjoyloveneutralsad
sad, neutral00011
love00100
love, joy->01100
anger10000
neutral00010

We will use OneHotEncoder to create new features for the categorical columns of our dataset.

OneHotEncoder cannot process NaNs, so you have to preprocess them first.

The common syntax is next:

12345678910
from sklearn.preprocessing import OneHotEncoder # data is loaded already # num_cols and cat_cols are created already encoder = OneHotEncoder() new_data = pd.DataFrame(encoder.fit_transform(data[cat_cols]).toarray()) # join new features to the dataset, but remove categorical features data = data[num_cols].join(new_data)
copy

Tarea

Apply the One Hot Encoding to the dataset.

  1. Load the dataset.
  2. Process the NaNs: drop it for the Embarked, replace with mean value for Age.
  3. Transform the Cabin data as in the previous chapter (apply the Label Encoding).
  4. Create the variable cat_cols to store such a categorical features: Sex, Cabin, and Embarked.
  5. Create OneHotEncoder and store the transformed data to the new_data.
  6. Remove the cat_cols from the dataframe, but add the new_data.
  7. Check the sample.

Cambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
Sección 5. Capítulo 2
Cambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
We're sorry to hear that something went wrong. What happened?
some-alt