Course Content
Preprocessing Data
Preprocessing Data
Types Conversion
You can discover that data can be stored in the dataset in the wrong format or type. The most common cases are:
- storing integer or float values as string variables.
- storing date and time values as strings.
- storing values in a form that can be converted to a more suitable one.
Let's explore the dataset exercise
containing info about diet, pulse, time, and kind of different exercises. There is sample data:
unnamed | id | diet | pulse | time | kind |
35 | 12 | low fat | 104 | 30 min | walking |
64 | 22 | low fat | 104 | 15 min | running |
10 | 4 | low fat | 82 | 15 min | rest |
18 | 7 | no fat | 87 | 1 min | rest |
48 | 17 | no fat | 103 | 1 min | walking |
It makes sense to modify the time
column data: all rows contain the duration in minutes, so info about time units (min, sec, ot hours) is useless. We're gonna remove the extra symbols and store only numerical values, which additionally will be converted to int
.
Task
Apply the type conversion to the time
column. Remove the last 4 symbols which are equal to min
and convert the rest to int
. Check the sample.
Thanks for your feedback!
Types Conversion
You can discover that data can be stored in the dataset in the wrong format or type. The most common cases are:
- storing integer or float values as string variables.
- storing date and time values as strings.
- storing values in a form that can be converted to a more suitable one.
Let's explore the dataset exercise
containing info about diet, pulse, time, and kind of different exercises. There is sample data:
unnamed | id | diet | pulse | time | kind |
35 | 12 | low fat | 104 | 30 min | walking |
64 | 22 | low fat | 104 | 15 min | running |
10 | 4 | low fat | 82 | 15 min | rest |
18 | 7 | no fat | 87 | 1 min | rest |
48 | 17 | no fat | 103 | 1 min | walking |
It makes sense to modify the time
column data: all rows contain the duration in minutes, so info about time units (min, sec, ot hours) is useless. We're gonna remove the extra symbols and store only numerical values, which additionally will be converted to int
.
Task
Apply the type conversion to the time
column. Remove the last 4 symbols which are equal to min
and convert the rest to int
. Check the sample.
Thanks for your feedback!
Types Conversion
You can discover that data can be stored in the dataset in the wrong format or type. The most common cases are:
- storing integer or float values as string variables.
- storing date and time values as strings.
- storing values in a form that can be converted to a more suitable one.
Let's explore the dataset exercise
containing info about diet, pulse, time, and kind of different exercises. There is sample data:
unnamed | id | diet | pulse | time | kind |
35 | 12 | low fat | 104 | 30 min | walking |
64 | 22 | low fat | 104 | 15 min | running |
10 | 4 | low fat | 82 | 15 min | rest |
18 | 7 | no fat | 87 | 1 min | rest |
48 | 17 | no fat | 103 | 1 min | walking |
It makes sense to modify the time
column data: all rows contain the duration in minutes, so info about time units (min, sec, ot hours) is useless. We're gonna remove the extra symbols and store only numerical values, which additionally will be converted to int
.
Task
Apply the type conversion to the time
column. Remove the last 4 symbols which are equal to min
and convert the rest to int
. Check the sample.
Thanks for your feedback!
You can discover that data can be stored in the dataset in the wrong format or type. The most common cases are:
- storing integer or float values as string variables.
- storing date and time values as strings.
- storing values in a form that can be converted to a more suitable one.
Let's explore the dataset exercise
containing info about diet, pulse, time, and kind of different exercises. There is sample data:
unnamed | id | diet | pulse | time | kind |
35 | 12 | low fat | 104 | 30 min | walking |
64 | 22 | low fat | 104 | 15 min | running |
10 | 4 | low fat | 82 | 15 min | rest |
18 | 7 | no fat | 87 | 1 min | rest |
48 | 17 | no fat | 103 | 1 min | walking |
It makes sense to modify the time
column data: all rows contain the duration in minutes, so info about time units (min, sec, ot hours) is useless. We're gonna remove the extra symbols and store only numerical values, which additionally will be converted to int
.
Task
Apply the type conversion to the time
column. Remove the last 4 symbols which are equal to min
and convert the rest to int
. Check the sample.