Conteúdo do Curso
Advanced Techniques in pandas
Advanced Techniques in pandas
Getting Familiar With the .groupby() Method
I am happy to see you in this section. Here, we will group our data to find information about different groups of rows. Examine the data set on delays (you can scroll this table horizontally):
Grouping data is beneficial, and now we will dive deeper into it. Imagine you want to calculate the number of delays for each flight number. Look at the code example and then at the explanation:
import pandas as pd data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane', index_col = 0) data_flights = data[['Flight', 'Delay']].groupby('Flight').sum() print(data_flights.head())
Explanation:
data[['Flight', 'Delay']]
- These are the columns you will work on, including the columns you will group;groupby('Flight')
- The'Flight'
column is the argument for the.groupby()
function. This means that rows with the same value in the'Flight'
column will be grouped together;.sum()
- This function operates on rows within each group created by.groupby()
. In this case, it sums the values in the'Delay'
column for rows that belong to the same'Flight'
group.
Note
Since the
'Delay'
column contains only0
(no delay occurred) or1
(a delay occurred) as its possible values, the sum of the rows represents the number of delays for each flight.
In fact, .sum()
is one of many aggregation functions you can use. You will become familiar with all of them as you proceed.
Obrigado pelo seu feedback!