Ordinal Encoding
Features can be divided into categorical and numerical.
A categorical feature is a feature whose value can be attributed to any group, but the order of the values in this group is completely unimportant. Between the values of categorical features it is impossible to establish the relationship > or < ('greater' or 'less').
The value of a numeric feature is a scalar. Between the values of numeric features it is possible to establish the relationship 'greate' or 'less'.
ΠΠΠ Π’ΠΠΠΠ???
Scikit-learn does not support processing of categorical features. So we should move to numerical representation
We have the two most techniques to move to numerical representation: an Ordinal Encoding and an One-Hot Encoding. Let's get acquainted with the one of them Ordinal Encoding - the point of this encoding is that each unique value of the category is encoded with an integer number. For example: python
is 1
, SQL
is 2
, Java
is 3
.
Now, let's look at the example how to implelemt this encoding.
123456789101112131415# example of a ordinal encoding import pandas as pd from sklearn.preprocessing import OrdinalEncoder # define data data = pd.read_csv('C:/Users/User1/Desktop/Π ΠΠΠΠ’Π/Data.csv') print(data) # define ordinal encoding encoder = OrdinalEncoder() # transform data result = encoder.fit(data) result = result.transform(data) print(result)
It is time for an example.
Input:
import numpy as np
# Importing the class
from sklearn.impute import SimpleImputer
# Creating an imputer object
imputer = SimpleImputer(missing_values = 0,strategy ='mean')
# Your data
df = [[10, 0, 20], [0, 25, 30], [30, 35, 0]]
# Displaying data
print('Data with missing values:', df)
# Fitting the imputer on your data
imputer = imputer.fit(df)
# Impute all missing values in you data
df = imputer.transform(df)
# Displaying the result
print('Data without missing values:', df)
Output:
data with missing values:
[[10, nan, 20], [nan, 25, 30], [30, 35, nan]]
data without missing values:
[[10. 15. 20.][15. 25. 30.][30. 35. 15.]]
Analysis
We see that here the missing values are represented by zeros(missing_values = 0), we replace them with the mean value(strategy ='mean') of the column in which the missing value is located.
Swipe to start coding
Let's try to fill the empty space in your small dataset.To use SimpleImputer you have to implement the next steps:
- Import the class.
- Create an instance of the class (imputer object).
- Specify the parameters you need, especially: we see that here the missing values are represented by NaN, so replace them with the constant value 15.
- Fit the imputer on your data using
fit()
function - Impute all missing values in you data using
transform()
function.
Solution
Thanks for your feedback!
single
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 12.5
Ordinal Encoding
Swipe to show menu
Features can be divided into categorical and numerical.
A categorical feature is a feature whose value can be attributed to any group, but the order of the values in this group is completely unimportant. Between the values of categorical features it is impossible to establish the relationship > or < ('greater' or 'less').
The value of a numeric feature is a scalar. Between the values of numeric features it is possible to establish the relationship 'greate' or 'less'.
ΠΠΠ Π’ΠΠΠΠ???
Scikit-learn does not support processing of categorical features. So we should move to numerical representation
We have the two most techniques to move to numerical representation: an Ordinal Encoding and an One-Hot Encoding. Let's get acquainted with the one of them Ordinal Encoding - the point of this encoding is that each unique value of the category is encoded with an integer number. For example: python
is 1
, SQL
is 2
, Java
is 3
.
Now, let's look at the example how to implelemt this encoding.
123456789101112131415# example of a ordinal encoding import pandas as pd from sklearn.preprocessing import OrdinalEncoder # define data data = pd.read_csv('C:/Users/User1/Desktop/Π ΠΠΠΠ’Π/Data.csv') print(data) # define ordinal encoding encoder = OrdinalEncoder() # transform data result = encoder.fit(data) result = result.transform(data) print(result)
It is time for an example.
Input:
import numpy as np
# Importing the class
from sklearn.impute import SimpleImputer
# Creating an imputer object
imputer = SimpleImputer(missing_values = 0,strategy ='mean')
# Your data
df = [[10, 0, 20], [0, 25, 30], [30, 35, 0]]
# Displaying data
print('Data with missing values:', df)
# Fitting the imputer on your data
imputer = imputer.fit(df)
# Impute all missing values in you data
df = imputer.transform(df)
# Displaying the result
print('Data without missing values:', df)
Output:
data with missing values:
[[10, nan, 20], [nan, 25, 30], [30, 35, nan]]
data without missing values:
[[10. 15. 20.][15. 25. 30.][30. 35. 15.]]
Analysis
We see that here the missing values are represented by zeros(missing_values = 0), we replace them with the mean value(strategy ='mean') of the column in which the missing value is located.
Swipe to start coding
Let's try to fill the empty space in your small dataset.To use SimpleImputer you have to implement the next steps:
- Import the class.
- Create an instance of the class (imputer object).
- Specify the parameters you need, especially: we see that here the missing values are represented by NaN, so replace them with the constant value 15.
- Fit the imputer on your data using
fit()
function - Impute all missing values in you data using
transform()
function.
Solution
Thanks for your feedback!
Awesome!
Completion rate improved to 12.5single