During this section we will work with one dataset about prices of houses in Amsterdam. Let's get to know more about it. Take a look at first rows. 

Here, we have the next columns:
- **Adress** - the detailed adress of the house, 
- **Zip** - a postal code,
- **Price** - the price of the house,
- **Area** - the total area of the house,
- **Room** - the number of rooms in the house,
- **Lon** and **Lat**- a coordinate scheme that can locate or identify geographic positions on the surfaces.

To work with this dataset in the next chapters you have to make some preperation.

1. Checking and Dropping Duplicates.
First, let's find duplicated values in the dataset using the `duplicated()` function. Then you have to delete these duplicates with the hepl `drop_duplicates()` function, like in the example below.

# Checking duplicates
dataset.loc[dataset.duplicated()]
# Dropping duplicates
dataset.drop_duplicates()

2. Checking and replacing *null* values.
To find *null* values you have to use `isnull()` function. To replace this values, you have to use `SimpleImputer()` function.

# Checking null values
dataset.isnull().sum()
# Replacing null values
imputer = SimpleImputer(missing_values = 0,strategy ='mean')
# Fitting the imputer on your data 
imputer = imputer.fit(dataset)
# Impute all missing values in you data
dataset = imputer.transform(dataset)

3. We can see that in our dataset not all values are numerals, we have also categorical values, with which it is harder to work.For it is essential to make categorical encoding values, using for instance OrdinalEncoder() function. But in our case, the addresses of each house are different, so there is so little sense to do encoding here. In this case let's take a sub-dataset from the whole dataset without categorical values, only with numerals.

# Create list for categorical variables
categorical_features= list(dataset.select_dtypes(include=['object']).keys())
# Deleting the categorial values
dataset_without_categorical_features = dataset.drop(columns=categorical_features, axis=1)

It is time to make all this steps on the dataset in the task. Let's start!

Prepearing Data Set 1/2

Oplossing