Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Complicated Grouping | Aggregating Data
Advanced Techniques in pandas
course content

Contenido del Curso

Advanced Techniques in pandas

Advanced Techniques in pandas

1. Getting Familiar With Indexing and Selecting Data
2. Dealing With Conditions
3. Extracting Data
4. Aggregating Data
5. Preprocessing Data

Complicated Grouping

It is sometimes the case that we aren't satisfied with built-in pandas functions, like .mean() or .min() while grouping.

Look at the column 'Length'; here, we have the flight length in minutes. Imagine we want to calculate the maximum time in hours for items having the same value in the 'Flight' column and then in the 'Airline' one. To do so, we can calculate the maximum value of the column 'Length' for each group key and then divide it by 60. Look at the example and the explanation below.

1234
import pandas as pd data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane', index_col = 0) data_flights = data[['Flight', 'Airline', 'Length']].groupby(['Flight', 'Airline']).apply(lambda x: x['Length'].max()/60) print(data_flights.head(10))
copy

Explanation:

We made the example from the previous chapters a little bit complicated, so with data grouping, everything is the same; let's turn to the .apply() method.

  • .apply() - it helps apply specific function to the needed columns;
  • in the lambda function, x is the argument and x['Length'].max()/60 is the expression. So, the function finds the maximum value for each group key and divides the aggregated value by 60.

Tarea

Your task here is to group data by the airport from which the flight started and then by the weekday. Calculate the minimum amount of time of the sum of the groups' columns 'Length' and 'Time' for the groups to figure out how long the flight with delay may take. Follow the algorithm to manage the task:

Group data:

  • Store the list of columns 'AirportFrom', 'Airline', 'Time', and 'Length' (in this order) in the columns variable;
  • Extract columns from data;
  • The order is crucial within the .groupby() method; put the columns 'AirportFrom' and 'Airline' in this order;
  • Apply the function to the values of the data set having the same group keys;
  • Calculate the sum of two columns: 'Length' and 'Time'. Then find their minimum.

Tarea

Your task here is to group data by the airport from which the flight started and then by the weekday. Calculate the minimum amount of time of the sum of the groups' columns 'Length' and 'Time' for the groups to figure out how long the flight with delay may take. Follow the algorithm to manage the task:

Group data:

  • Store the list of columns 'AirportFrom', 'Airline', 'Time', and 'Length' (in this order) in the columns variable;
  • Extract columns from data;
  • The order is crucial within the .groupby() method; put the columns 'AirportFrom' and 'Airline' in this order;
  • Apply the function to the values of the data set having the same group keys;
  • Calculate the sum of two columns: 'Length' and 'Time'. Then find their minimum.

Cambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones

¿Todo estuvo claro?

Sección 4. Capítulo 3
toggle bottom row

Complicated Grouping

It is sometimes the case that we aren't satisfied with built-in pandas functions, like .mean() or .min() while grouping.

Look at the column 'Length'; here, we have the flight length in minutes. Imagine we want to calculate the maximum time in hours for items having the same value in the 'Flight' column and then in the 'Airline' one. To do so, we can calculate the maximum value of the column 'Length' for each group key and then divide it by 60. Look at the example and the explanation below.

1234
import pandas as pd data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane', index_col = 0) data_flights = data[['Flight', 'Airline', 'Length']].groupby(['Flight', 'Airline']).apply(lambda x: x['Length'].max()/60) print(data_flights.head(10))
copy

Explanation:

We made the example from the previous chapters a little bit complicated, so with data grouping, everything is the same; let's turn to the .apply() method.

  • .apply() - it helps apply specific function to the needed columns;
  • in the lambda function, x is the argument and x['Length'].max()/60 is the expression. So, the function finds the maximum value for each group key and divides the aggregated value by 60.

Tarea

Your task here is to group data by the airport from which the flight started and then by the weekday. Calculate the minimum amount of time of the sum of the groups' columns 'Length' and 'Time' for the groups to figure out how long the flight with delay may take. Follow the algorithm to manage the task:

Group data:

  • Store the list of columns 'AirportFrom', 'Airline', 'Time', and 'Length' (in this order) in the columns variable;
  • Extract columns from data;
  • The order is crucial within the .groupby() method; put the columns 'AirportFrom' and 'Airline' in this order;
  • Apply the function to the values of the data set having the same group keys;
  • Calculate the sum of two columns: 'Length' and 'Time'. Then find their minimum.

Tarea

Your task here is to group data by the airport from which the flight started and then by the weekday. Calculate the minimum amount of time of the sum of the groups' columns 'Length' and 'Time' for the groups to figure out how long the flight with delay may take. Follow the algorithm to manage the task:

Group data:

  • Store the list of columns 'AirportFrom', 'Airline', 'Time', and 'Length' (in this order) in the columns variable;
  • Extract columns from data;
  • The order is crucial within the .groupby() method; put the columns 'AirportFrom' and 'Airline' in this order;
  • Apply the function to the values of the data set having the same group keys;
  • Calculate the sum of two columns: 'Length' and 'Time'. Then find their minimum.

Cambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones

¿Todo estuvo claro?

Sección 4. Capítulo 3
toggle bottom row

Complicated Grouping

It is sometimes the case that we aren't satisfied with built-in pandas functions, like .mean() or .min() while grouping.

Look at the column 'Length'; here, we have the flight length in minutes. Imagine we want to calculate the maximum time in hours for items having the same value in the 'Flight' column and then in the 'Airline' one. To do so, we can calculate the maximum value of the column 'Length' for each group key and then divide it by 60. Look at the example and the explanation below.

1234
import pandas as pd data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane', index_col = 0) data_flights = data[['Flight', 'Airline', 'Length']].groupby(['Flight', 'Airline']).apply(lambda x: x['Length'].max()/60) print(data_flights.head(10))
copy

Explanation:

We made the example from the previous chapters a little bit complicated, so with data grouping, everything is the same; let's turn to the .apply() method.

  • .apply() - it helps apply specific function to the needed columns;
  • in the lambda function, x is the argument and x['Length'].max()/60 is the expression. So, the function finds the maximum value for each group key and divides the aggregated value by 60.

Tarea

Your task here is to group data by the airport from which the flight started and then by the weekday. Calculate the minimum amount of time of the sum of the groups' columns 'Length' and 'Time' for the groups to figure out how long the flight with delay may take. Follow the algorithm to manage the task:

Group data:

  • Store the list of columns 'AirportFrom', 'Airline', 'Time', and 'Length' (in this order) in the columns variable;
  • Extract columns from data;
  • The order is crucial within the .groupby() method; put the columns 'AirportFrom' and 'Airline' in this order;
  • Apply the function to the values of the data set having the same group keys;
  • Calculate the sum of two columns: 'Length' and 'Time'. Then find their minimum.

Tarea

Your task here is to group data by the airport from which the flight started and then by the weekday. Calculate the minimum amount of time of the sum of the groups' columns 'Length' and 'Time' for the groups to figure out how long the flight with delay may take. Follow the algorithm to manage the task:

Group data:

  • Store the list of columns 'AirportFrom', 'Airline', 'Time', and 'Length' (in this order) in the columns variable;
  • Extract columns from data;
  • The order is crucial within the .groupby() method; put the columns 'AirportFrom' and 'Airline' in this order;
  • Apply the function to the values of the data set having the same group keys;
  • Calculate the sum of two columns: 'Length' and 'Time'. Then find their minimum.

Cambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones

¿Todo estuvo claro?

It is sometimes the case that we aren't satisfied with built-in pandas functions, like .mean() or .min() while grouping.

Look at the column 'Length'; here, we have the flight length in minutes. Imagine we want to calculate the maximum time in hours for items having the same value in the 'Flight' column and then in the 'Airline' one. To do so, we can calculate the maximum value of the column 'Length' for each group key and then divide it by 60. Look at the example and the explanation below.

1234
import pandas as pd data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane', index_col = 0) data_flights = data[['Flight', 'Airline', 'Length']].groupby(['Flight', 'Airline']).apply(lambda x: x['Length'].max()/60) print(data_flights.head(10))
copy

Explanation:

We made the example from the previous chapters a little bit complicated, so with data grouping, everything is the same; let's turn to the .apply() method.

  • .apply() - it helps apply specific function to the needed columns;
  • in the lambda function, x is the argument and x['Length'].max()/60 is the expression. So, the function finds the maximum value for each group key and divides the aggregated value by 60.

Tarea

Your task here is to group data by the airport from which the flight started and then by the weekday. Calculate the minimum amount of time of the sum of the groups' columns 'Length' and 'Time' for the groups to figure out how long the flight with delay may take. Follow the algorithm to manage the task:

Group data:

  • Store the list of columns 'AirportFrom', 'Airline', 'Time', and 'Length' (in this order) in the columns variable;
  • Extract columns from data;
  • The order is crucial within the .groupby() method; put the columns 'AirportFrom' and 'Airline' in this order;
  • Apply the function to the values of the data set having the same group keys;
  • Calculate the sum of two columns: 'Length' and 'Time'. Then find their minimum.

Cambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
Sección 4. Capítulo 3
Cambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
We're sorry to hear that something went wrong. What happened?
some-alt