Contenu du cours
Data Preprocessing
Data Preprocessing
Data Types
The main tool we will use to manipulate data is pandas
. We can start right away by loading the data:
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/penguins.csv') print(df.head())
As you understand, each dataset can contain many different data types, for example, numeric (integers, floating point numbers), strings (str), and datetime. To find out what data type a column has, you can call the .dtypes
property:
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/penguins.csv') print(df.dtypes)
Let's say you have a column with numeric values but in string format and want to change the data type to numeric. To do this, use the .astype()
method:
Swipe to start coding
Read the penguins.csv
dataset and change the data type in the body_mass_g
column from float
to int
.
Don't modify the initial code, only replace the gaps ___
with the correct code.
Once you've completed this task, click the button below the code to check your solution.
Solution
Merci pour vos commentaires !
Data Types
The main tool we will use to manipulate data is pandas
. We can start right away by loading the data:
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/penguins.csv') print(df.head())
As you understand, each dataset can contain many different data types, for example, numeric (integers, floating point numbers), strings (str), and datetime. To find out what data type a column has, you can call the .dtypes
property:
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/penguins.csv') print(df.dtypes)
Let's say you have a column with numeric values but in string format and want to change the data type to numeric. To do this, use the .astype()
method:
Swipe to start coding
Read the penguins.csv
dataset and change the data type in the body_mass_g
column from float
to int
.
Don't modify the initial code, only replace the gaps ___
with the correct code.
Once you've completed this task, click the button below the code to check your solution.
Solution
Merci pour vos commentaires !