Course Content
Data Preprocessing
Data Preprocessing
Data Types
The main tool we will use to manipulate data is pandas
. We can start right away by loading the data:
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/penguins.csv') print(df.head())
As you understand, each dataset can contain many different data types, for example, numeric (integers, floating point numbers), strings (str), and datetime. To find out what data type a column has, you can call the .dtypes
property:
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/penguins.csv') print(df.dtypes)
Let's say you have a column with numeric values but in string format and want to change the data type to numeric. To do this, use the .astype()
method:
Task
Read the penguins.csv
dataset and change the data type in the body_mass_g
column from float
to int
.
Don't modify the initial code, only replace the gaps ___
with the correct code.
Once you've completed this task, click the button below the code to check your solution.
Thanks for your feedback!
Data Types
The main tool we will use to manipulate data is pandas
. We can start right away by loading the data:
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/penguins.csv') print(df.head())
As you understand, each dataset can contain many different data types, for example, numeric (integers, floating point numbers), strings (str), and datetime. To find out what data type a column has, you can call the .dtypes
property:
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/penguins.csv') print(df.dtypes)
Let's say you have a column with numeric values but in string format and want to change the data type to numeric. To do this, use the .astype()
method:
Task
Read the penguins.csv
dataset and change the data type in the body_mass_g
column from float
to int
.
Don't modify the initial code, only replace the gaps ___
with the correct code.
Once you've completed this task, click the button below the code to check your solution.
Thanks for your feedback!
Data Types
The main tool we will use to manipulate data is pandas
. We can start right away by loading the data:
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/penguins.csv') print(df.head())
As you understand, each dataset can contain many different data types, for example, numeric (integers, floating point numbers), strings (str), and datetime. To find out what data type a column has, you can call the .dtypes
property:
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/penguins.csv') print(df.dtypes)
Let's say you have a column with numeric values but in string format and want to change the data type to numeric. To do this, use the .astype()
method:
Task
Read the penguins.csv
dataset and change the data type in the body_mass_g
column from float
to int
.
Don't modify the initial code, only replace the gaps ___
with the correct code.
Once you've completed this task, click the button below the code to check your solution.
Thanks for your feedback!
The main tool we will use to manipulate data is pandas
. We can start right away by loading the data:
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/penguins.csv') print(df.head())
As you understand, each dataset can contain many different data types, for example, numeric (integers, floating point numbers), strings (str), and datetime. To find out what data type a column has, you can call the .dtypes
property:
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/penguins.csv') print(df.dtypes)
Let's say you have a column with numeric values but in string format and want to change the data type to numeric. To do this, use the .astype()
method:
Task
Read the penguins.csv
dataset and change the data type in the body_mass_g
column from float
to int
.
Don't modify the initial code, only replace the gaps ___
with the correct code.
Once you've completed this task, click the button below the code to check your solution.