Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele Data Types | Brief Introduction
Data Preprocessing
course content

Kurssisisältö

Data Preprocessing

Data Preprocessing

1. Brief Introduction
2. Processing Quantitative Data
3. Processing Categorical Data
4. Time Series Data Processing
5. Feature Engineering
6. Moving on to Tasks

book
Data Types

The main tool we will use to manipulate data is pandas. We can start right away by loading the data:

12345
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/penguins.csv') print(df.head())
copy

As you understand, each dataset can contain many different data types, for example, numeric (integers, floating point numbers), strings (str), and datetime. To find out what data type a column has, you can call the .dtypes property:

12345
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/penguins.csv') print(df.dtypes)
copy

Let's say you have a column with numeric values but in string format and want to change the data type to numeric. To do this, use the .astype() method:

python
Tehtävä

Swipe to start coding

Read the penguins.csv dataset and change the data type in the body_mass_g column from float to int.

Don't modify the initial code, only replace the gaps ___ with the correct code.

Once you've completed this task, click the button below the code to check your solution.

Ratkaisu

Switch to desktopVaihda työpöytään todellista harjoitusta vartenJatka siitä, missä olet käyttämällä jotakin alla olevista vaihtoehdoista
Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 1. Luku 1
toggle bottom row

book
Data Types

The main tool we will use to manipulate data is pandas. We can start right away by loading the data:

12345
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/penguins.csv') print(df.head())
copy

As you understand, each dataset can contain many different data types, for example, numeric (integers, floating point numbers), strings (str), and datetime. To find out what data type a column has, you can call the .dtypes property:

12345
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/penguins.csv') print(df.dtypes)
copy

Let's say you have a column with numeric values but in string format and want to change the data type to numeric. To do this, use the .astype() method:

python
Tehtävä

Swipe to start coding

Read the penguins.csv dataset and change the data type in the body_mass_g column from float to int.

Don't modify the initial code, only replace the gaps ___ with the correct code.

Once you've completed this task, click the button below the code to check your solution.

Ratkaisu

Switch to desktopVaihda työpöytään todellista harjoitusta vartenJatka siitä, missä olet käyttämällä jotakin alla olevista vaihtoehdoista
Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 1. Luku 1
Switch to desktopVaihda työpöytään todellista harjoitusta vartenJatka siitä, missä olet käyttämällä jotakin alla olevista vaihtoehdoista
Pahoittelemme, että jotain meni pieleen. Mitä tapahtui?
some-alt