Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Ordinal Encoding | The Very First Steps
Introduction to Scikit Learn

Stryg for at vise menuen

book
Ordinal Encoding

Features can be divided into categorical and numerical.

A categorical feature is a feature whose value can be attributed to any group, but the order of the values in this group is completely unimportant. Between the values of categorical features it is impossible to establish the relationship > or < ('greater' or 'less').

The value of a numeric feature is a scalar. Between the values of numeric features it is possible to establish the relationship 'greate' or 'less'.

КАРТИНКА???

Scikit-learn does not support processing of categorical features. So we should move to numerical representation

We have the two most techniques to move to numerical representation: an Ordinal Encoding and an One-Hot Encoding. Let's get acquainted with the one of them Ordinal Encoding - the point of this encoding is that each unique value of the category is encoded with an integer number. For example: python is 1, SQL is 2, Java is 3.

Now, let's look at the example how to implelemt this encoding.

123456789101112131415
# example of a ordinal encoding import pandas as pd from sklearn.preprocessing import OrdinalEncoder # define data data = pd.read_csv('C:/Users/User1/Desktop/РОБОТА/Data.csv') print(data) # define ordinal encoding encoder = OrdinalEncoder() # transform data result = encoder.fit(data) result = result.transform(data) print(result)
copy

It is time for an example.

python

Analysis

We see that here the missing values are represented by zeros(missing_values = 0), we replace them with the mean value(strategy ='mean') of the column in which the missing value is located.

Opgave

Swipe to start coding

Let's try to fill the empty space in your small dataset.To use SimpleImputer you have to implement the next steps:

  1. Import the class.
  2. Create an instance of the class (imputer object).
  3. Specify the parameters you need, especially: we see that here the missing values are represented by NaN, so replace them with the constant value 15.
  4. Fit the imputer on your data using fit() function
  5. Impute all missing values in you data using transform() function.

Løsning

Switch to desktopSkift til skrivebord for at øve i den virkelige verdenFortsæt der, hvor du er, med en af nedenstående muligheder
Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 1. Kapitel 3

Spørg AI

expand
ChatGPT

Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat

book
Ordinal Encoding

Features can be divided into categorical and numerical.

A categorical feature is a feature whose value can be attributed to any group, but the order of the values in this group is completely unimportant. Between the values of categorical features it is impossible to establish the relationship > or < ('greater' or 'less').

The value of a numeric feature is a scalar. Between the values of numeric features it is possible to establish the relationship 'greate' or 'less'.

КАРТИНКА???

Scikit-learn does not support processing of categorical features. So we should move to numerical representation

We have the two most techniques to move to numerical representation: an Ordinal Encoding and an One-Hot Encoding. Let's get acquainted with the one of them Ordinal Encoding - the point of this encoding is that each unique value of the category is encoded with an integer number. For example: python is 1, SQL is 2, Java is 3.

Now, let's look at the example how to implelemt this encoding.

123456789101112131415
# example of a ordinal encoding import pandas as pd from sklearn.preprocessing import OrdinalEncoder # define data data = pd.read_csv('C:/Users/User1/Desktop/РОБОТА/Data.csv') print(data) # define ordinal encoding encoder = OrdinalEncoder() # transform data result = encoder.fit(data) result = result.transform(data) print(result)
copy

It is time for an example.

python

Analysis

We see that here the missing values are represented by zeros(missing_values = 0), we replace them with the mean value(strategy ='mean') of the column in which the missing value is located.

Opgave

Swipe to start coding

Let's try to fill the empty space in your small dataset.To use SimpleImputer you have to implement the next steps:

  1. Import the class.
  2. Create an instance of the class (imputer object).
  3. Specify the parameters you need, especially: we see that here the missing values are represented by NaN, so replace them with the constant value 15.
  4. Fit the imputer on your data using fit() function
  5. Impute all missing values in you data using transform() function.

Løsning

Switch to desktopSkift til skrivebord for at øve i den virkelige verdenFortsæt der, hvor du er, med en af nedenstående muligheder
Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 1. Kapitel 3
Switch to desktopSkift til skrivebord for at øve i den virkelige verdenFortsæt der, hvor du er, med en af nedenstående muligheder
Vi beklager, at noget gik galt. Hvad skete der?
some-alt