Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Prepearing Data Set 2/2 | Models in Scikit Learn
Introduction to Scikit Learn
course content

Course Content

Introduction to Scikit Learn

Introduction to Scikit Learn

1. The Very First Steps
2. Scaling Numerical Data
3. Models in Scikit Learn

bookPrepearing Data Set 2/2

Let's make the last preperation of the prices houses in Amsterdam dataset. If you take a look one more time at this dataset...

[object Object]

You will see that, for example, the values in price and room columns are different orders. We know, that it is better to work with data, which are reduced to one range of values. Let's do it with standardization. We will do it in two ways. Firslty without built-in functions, just using fomula.

  1. Let's find mean and variance values.
1234
# Calculating mean values print('The mean value of each column in the dataset:', dataset.mean()) # Calculating variance values print('The std value of each column in the dataset:', dataset.var())
copy
  1. Then we calculate standardized values using the following formula:

12
# Checking null values dataset.apply(lambda x: (x-x.mean())/ x.std(), axis=0)
copy
  1. Or we can do it, just using StandardScaler() function in the follwing way:
12345678
scaler = StandardScaler() scaler.fit(dataset) # Calculating mean value print(scaler.mean_) # Calculating variance value print(scaler.var_) scaled_data = scaler.transform(dataset) print(scaled_data)
copy

It is time to make all this steps on the dataset in the task. Let's start!

Task

  1. Importing libraries and loading dataset.
  2. Finding and dropping duplicated values.
  3. Finding and replacing null values with mean value.
  4. Delete categorial values, leaving only numerals.

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 3. Chapter 2
some-alt