Course Content
Introduction to Scikit Learn
Introduction to Scikit Learn
1. The Very First Steps
2. Scaling Numerical Data
3. Models in Scikit Learn
Prepearing Data Set 2/2
Let's make the last preperation of the prices houses in Amsterdam
dataset. If you take a look one more time at this dataset...
[object Object]
You will see that, for example, the values in price
and room
columns are different orders. We know, that it is better to work with data, which are reduced to one range of values. Let's do it with standardization. We will do it in two ways. Firslty without built-in functions, just using fomula.
- Let's find
mean
andvariance
values.
# Calculating mean values print('The mean value of each column in the dataset:', dataset.mean()) # Calculating variance values print('The std value of each column in the dataset:', dataset.var())
- Then we calculate standardized values using the following formula:
# Checking null values dataset.apply(lambda x: (x-x.mean())/ x.std(), axis=0)
- Or we can do it, just using
StandardScaler()
function in the follwing way:
scaler = StandardScaler() scaler.fit(dataset) # Calculating mean value print(scaler.mean_) # Calculating variance value print(scaler.var_) scaled_data = scaler.transform(dataset) print(scaled_data)
It is time to make all this steps on the dataset in the task. Let's start!
Task
- Importing libraries and loading dataset.
- Finding and dropping duplicated values.
- Finding and replacing null values with mean value.
- Delete categorial values, leaving only numerals.
Everything was clear?
Thanks for your feedback!
Section 3. Chapter 2