Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Prepearing Data Set 1/2 | Models in Scikit Learn
Introduction to Scikit Learn
course content

Зміст курсу

Introduction to Scikit Learn

Introduction to Scikit Learn

1. The Very First Steps
2. Scaling Numerical Data
3. Models in Scikit Learn

Prepearing Data Set 1/2

During this section we will work with one dataset about prices of houses in Amsterdam. Let's get to know more about it. Take a look at first rows.

[object Object]

Here, we have the next columns:

  • Adress - the detailed adress of the house,
  • Zip - a postal code,
  • Price - the price of the house,
  • Area - the total area of the house,
  • Room - the number of rooms in the house,
  • Lon and Lat- a coordinate scheme that can locate or identify geographic positions on the surfaces.

To work with this dataset in the next chapters you have to make some preperation.

  1. Checking and Dropping Duplicates. First, let's find duplicated values in the dataset using the duplicated() function. Then you have to delete these duplicates with the hepl drop_duplicates() function, like in the example below.
1234
# Checking duplicates dataset.loc[dataset.duplicated()] # Dropping duplicates dataset.drop_duplicates()
copy
  1. Checking and replacing null values. To find null values you have to use isnull() function. To replace this values, you have to use SimpleImputer() function.
12345678
# Checking null values dataset.isnull().sum() # Replacing null values imputer = SimpleImputer(missing_values = 0,strategy ='mean') # Fitting the imputer on your data imputer = imputer.fit(dataset) # Impute all missing values in you data dataset = imputer.transform(dataset)
copy
  1. We can see that in our dataset not all values are numerals, we have also categorical values, with which it is harder to work.For it is essential to make categorical encoding values, using for instance OrdinalEncoder() function. But in our case, the addresses of each house are different, so there is so little sense to do encoding here. In this case let's take a sub-dataset from the whole dataset without categorical values, only with numerals.
1234
# Create list for categorical variables categorical_features= list(dataset.select_dtypes(include=['object']).keys()) # Deleting the categorial values dataset_without_categorical_features = dataset.drop(columns=categorical_features, axis=1)
copy

It is time to make all this steps on the dataset in the task. Let's start!

Завдання

  1. Importing libraries and loading dataset.
  2. Finding and dropping duplicated values.
  3. Finding and replacing null values with mean value.
  4. Delete categorial values, leaving only numerals.

Завдання

  1. Importing libraries and loading dataset.
  2. Finding and dropping duplicated values.
  3. Finding and replacing null values with mean value.
  4. Delete categorial values, leaving only numerals.

Перейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів

Все було зрозуміло?

Секція 3. Розділ 1
toggle bottom row

Prepearing Data Set 1/2

During this section we will work with one dataset about prices of houses in Amsterdam. Let's get to know more about it. Take a look at first rows.

[object Object]

Here, we have the next columns:

  • Adress - the detailed adress of the house,
  • Zip - a postal code,
  • Price - the price of the house,
  • Area - the total area of the house,
  • Room - the number of rooms in the house,
  • Lon and Lat- a coordinate scheme that can locate or identify geographic positions on the surfaces.

To work with this dataset in the next chapters you have to make some preperation.

  1. Checking and Dropping Duplicates. First, let's find duplicated values in the dataset using the duplicated() function. Then you have to delete these duplicates with the hepl drop_duplicates() function, like in the example below.
1234
# Checking duplicates dataset.loc[dataset.duplicated()] # Dropping duplicates dataset.drop_duplicates()
copy
  1. Checking and replacing null values. To find null values you have to use isnull() function. To replace this values, you have to use SimpleImputer() function.
12345678
# Checking null values dataset.isnull().sum() # Replacing null values imputer = SimpleImputer(missing_values = 0,strategy ='mean') # Fitting the imputer on your data imputer = imputer.fit(dataset) # Impute all missing values in you data dataset = imputer.transform(dataset)
copy
  1. We can see that in our dataset not all values are numerals, we have also categorical values, with which it is harder to work.For it is essential to make categorical encoding values, using for instance OrdinalEncoder() function. But in our case, the addresses of each house are different, so there is so little sense to do encoding here. In this case let's take a sub-dataset from the whole dataset without categorical values, only with numerals.
1234
# Create list for categorical variables categorical_features= list(dataset.select_dtypes(include=['object']).keys()) # Deleting the categorial values dataset_without_categorical_features = dataset.drop(columns=categorical_features, axis=1)
copy

It is time to make all this steps on the dataset in the task. Let's start!

Завдання

  1. Importing libraries and loading dataset.
  2. Finding and dropping duplicated values.
  3. Finding and replacing null values with mean value.
  4. Delete categorial values, leaving only numerals.

Завдання

  1. Importing libraries and loading dataset.
  2. Finding and dropping duplicated values.
  3. Finding and replacing null values with mean value.
  4. Delete categorial values, leaving only numerals.

Перейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів

Все було зрозуміло?

Секція 3. Розділ 1
toggle bottom row

Prepearing Data Set 1/2

During this section we will work with one dataset about prices of houses in Amsterdam. Let's get to know more about it. Take a look at first rows.

[object Object]

Here, we have the next columns:

  • Adress - the detailed adress of the house,
  • Zip - a postal code,
  • Price - the price of the house,
  • Area - the total area of the house,
  • Room - the number of rooms in the house,
  • Lon and Lat- a coordinate scheme that can locate or identify geographic positions on the surfaces.

To work with this dataset in the next chapters you have to make some preperation.

  1. Checking and Dropping Duplicates. First, let's find duplicated values in the dataset using the duplicated() function. Then you have to delete these duplicates with the hepl drop_duplicates() function, like in the example below.
1234
# Checking duplicates dataset.loc[dataset.duplicated()] # Dropping duplicates dataset.drop_duplicates()
copy
  1. Checking and replacing null values. To find null values you have to use isnull() function. To replace this values, you have to use SimpleImputer() function.
12345678
# Checking null values dataset.isnull().sum() # Replacing null values imputer = SimpleImputer(missing_values = 0,strategy ='mean') # Fitting the imputer on your data imputer = imputer.fit(dataset) # Impute all missing values in you data dataset = imputer.transform(dataset)
copy
  1. We can see that in our dataset not all values are numerals, we have also categorical values, with which it is harder to work.For it is essential to make categorical encoding values, using for instance OrdinalEncoder() function. But in our case, the addresses of each house are different, so there is so little sense to do encoding here. In this case let's take a sub-dataset from the whole dataset without categorical values, only with numerals.
1234
# Create list for categorical variables categorical_features= list(dataset.select_dtypes(include=['object']).keys()) # Deleting the categorial values dataset_without_categorical_features = dataset.drop(columns=categorical_features, axis=1)
copy

It is time to make all this steps on the dataset in the task. Let's start!

Завдання

  1. Importing libraries and loading dataset.
  2. Finding and dropping duplicated values.
  3. Finding and replacing null values with mean value.
  4. Delete categorial values, leaving only numerals.

Завдання

  1. Importing libraries and loading dataset.
  2. Finding and dropping duplicated values.
  3. Finding and replacing null values with mean value.
  4. Delete categorial values, leaving only numerals.

Перейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів

Все було зрозуміло?

During this section we will work with one dataset about prices of houses in Amsterdam. Let's get to know more about it. Take a look at first rows.

[object Object]

Here, we have the next columns:

  • Adress - the detailed adress of the house,
  • Zip - a postal code,
  • Price - the price of the house,
  • Area - the total area of the house,
  • Room - the number of rooms in the house,
  • Lon and Lat- a coordinate scheme that can locate or identify geographic positions on the surfaces.

To work with this dataset in the next chapters you have to make some preperation.

  1. Checking and Dropping Duplicates. First, let's find duplicated values in the dataset using the duplicated() function. Then you have to delete these duplicates with the hepl drop_duplicates() function, like in the example below.
1234
# Checking duplicates dataset.loc[dataset.duplicated()] # Dropping duplicates dataset.drop_duplicates()
copy
  1. Checking and replacing null values. To find null values you have to use isnull() function. To replace this values, you have to use SimpleImputer() function.
12345678
# Checking null values dataset.isnull().sum() # Replacing null values imputer = SimpleImputer(missing_values = 0,strategy ='mean') # Fitting the imputer on your data imputer = imputer.fit(dataset) # Impute all missing values in you data dataset = imputer.transform(dataset)
copy
  1. We can see that in our dataset not all values are numerals, we have also categorical values, with which it is harder to work.For it is essential to make categorical encoding values, using for instance OrdinalEncoder() function. But in our case, the addresses of each house are different, so there is so little sense to do encoding here. In this case let's take a sub-dataset from the whole dataset without categorical values, only with numerals.
1234
# Create list for categorical variables categorical_features= list(dataset.select_dtypes(include=['object']).keys()) # Deleting the categorial values dataset_without_categorical_features = dataset.drop(columns=categorical_features, axis=1)
copy

It is time to make all this steps on the dataset in the task. Let's start!

Завдання

  1. Importing libraries and loading dataset.
  2. Finding and dropping duplicated values.
  3. Finding and replacing null values with mean value.
  4. Delete categorial values, leaving only numerals.

Перейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Секція 3. Розділ 1
Перейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
We're sorry to hear that something went wrong. What happened?
some-alt