Prepearing Data Set 2/2
Let's make the last preperation of the prices houses in Amsterdam dataset. If you take a look one more time at this dataset...
You will see that, for example, the values in price and room columns are different orders. We know, that it is better to work with data, which are reduced to one range of values. Let's do it with standardization. We will do it in two ways. Firslty without built-in functions, just using fomula.
- Let's find
meanandvariancevalues.
1234# Calculating mean values print('The mean value of each column in the dataset:', dataset.mean()) # Calculating variance values print('The std value of each column in the dataset:', dataset.var())
- Then we calculate standardized values using the following formula:

12# Checking null values dataset.apply(lambda x: (x-x.mean())/ x.std(), axis=0)
- Or we can do it, just using
StandardScaler()function in the follwing way:
12345678scaler = StandardScaler() scaler.fit(dataset) # Calculating mean value print(scaler.mean_) # Calculating variance value print(scaler.var_) scaled_data = scaler.transform(dataset) print(scaled_data)
It is time to make all this steps on the dataset in the task. Let's start!
Swipe to start coding
- Importing libraries and loading dataset.
- Finding and dropping duplicated values.
- Finding and replacing null values with mean value.
- Delete categorial values, leaving only numerals.
Solution
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Ask me questions about this topic
Summarize this chapter
Show real-world examples
Awesome!
Completion rate improved to 12.5
Prepearing Data Set 2/2
Swipe to show menu
Let's make the last preperation of the prices houses in Amsterdam dataset. If you take a look one more time at this dataset...
You will see that, for example, the values in price and room columns are different orders. We know, that it is better to work with data, which are reduced to one range of values. Let's do it with standardization. We will do it in two ways. Firslty without built-in functions, just using fomula.
- Let's find
meanandvariancevalues.
1234# Calculating mean values print('The mean value of each column in the dataset:', dataset.mean()) # Calculating variance values print('The std value of each column in the dataset:', dataset.var())
- Then we calculate standardized values using the following formula:

12# Checking null values dataset.apply(lambda x: (x-x.mean())/ x.std(), axis=0)
- Or we can do it, just using
StandardScaler()function in the follwing way:
12345678scaler = StandardScaler() scaler.fit(dataset) # Calculating mean value print(scaler.mean_) # Calculating variance value print(scaler.var_) scaled_data = scaler.transform(dataset) print(scaled_data)
It is time to make all this steps on the dataset in the task. Let's start!
Swipe to start coding
- Importing libraries and loading dataset.
- Finding and dropping duplicated values.
- Finding and replacing null values with mean value.
- Delete categorial values, leaving only numerals.
Solution
Thanks for your feedback!