Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Getting Familiar With the .groupby() Method | Aggregating Data
Advanced Techniques in pandas
course content

Contenido del Curso

Advanced Techniques in pandas

Advanced Techniques in pandas

1. Getting Familiar With Indexing and Selecting Data
2. Dealing With Conditions
3. Extracting Data
4. Aggregating Data
5. Preprocessing Data

bookGetting Familiar With the .groupby() Method

I am happy to see you in this section. Here, we will group our data to find information about different groups of rows. Examine the data set on delays (you can scroll this table horizontally):

Grouping data is beneficial, and now we will dive deeper into it. Imagine you want to calculate the number of delays for each flight number. Look at the code example and then at the explanation:

1234
import pandas as pd data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane', index_col = 0) data_flights = data[['Flight', 'Delay']].groupby('Flight').sum() print(data_flights.head())
copy

Explanation:

  • data[['Flight', 'Delay']] - These are the columns you will work on, including the columns you will group;
  • groupby('Flight') - The 'Flight' column is the argument for the .groupby() function. This means that rows with the same value in the 'Flight' column will be grouped together;
  • .sum() - This function operates on rows within each group created by .groupby(). In this case, it sums the values in the 'Delay' column for rows that belong to the same 'Flight' group.

Note

Since the 'Delay' column contains only 0 (no delay occurred) or 1 (a delay occurred) as its possible values, the sum of the rows represents the number of delays for each flight.

In fact, .sum() is one of many aggregation functions you can use. You will become familiar with all of them as you proceed.

question-icon
Fill in the gaps to find the mean value of the `'Time'` column depending on the `'DayOfWeek'` column.

data_extracted = data[['', 'Time']]('').mean()
print(data_extracted)
DayOfWeekTime
3804.993130
4804.452984
5702.888362

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 4. Capítulo 1
some-alt