Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Advanced Grouping | Aggregating Data
Advanced Techniques in pandas
course content

Зміст курсу

Advanced Techniques in pandas

Advanced Techniques in pandas

1. Getting Familiar With Indexing and Selecting Data
2. Dealing With Conditions
3. Extracting Data
4. Aggregating Data
5. Preprocessing Data

Advanced Grouping

Let's expand our knowledge on the .groupby() method. As you remember, we can use the .agg() method. Indeed, the main pros of this function are that we can apply a different function to the numerical columns with one group key. Look at the example where we grouped flights by the column 'Airline', then counted the values in 'Delay' for each 'Airline', and calculated the minimum and maximum values for the 'Length' column. So convenient, isn't it?

1234
import pandas as pd data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane', index_col = 0) data_flights = data.groupby('Airline').agg({'Delay': 'count', 'Length': ['min', 'max']}) print(data_flights.head(10))
copy

Explanation:

  • .agg() - a method that allows us to apply functions on a series or on each element separately;
  • {} - we use curly brackets to specify the column and apply functions to them directly;
  • 'Delay': 'count' - applies the .count() function to the values in the 'Delay' column having the same key group;
  • 'Length': ['min', 'max'] - applies the .min() and .max() functions to the values in the 'Length' column having the same key group. You just need to put just the column name without the () or . symbols in the function. Pay attention; if you want to apply several functions to the same column, you must put them into the list.

Завдання

We can assume that a delay depends on the airline or the airport, but let's dive deeper and look at the average and maximum delay times depending on the airport from which the flight started and then on the airport at which the flight ended. Also, look at the median length of the flight. Follow the algorithm:

Group data:

  • Apply the .groupby() method to the dataset data;
  • Within the .groupby() method, put the columns 'AirportFrom' and 'AirportTo'; the order is crucial;
  • Using the .agg() method, calculate the aggregated values: the average and maximum value in the column 'Time', and the median value of the column 'Length'.

Завдання

We can assume that a delay depends on the airline or the airport, but let's dive deeper and look at the average and maximum delay times depending on the airport from which the flight started and then on the airport at which the flight ended. Also, look at the median length of the flight. Follow the algorithm:

Group data:

  • Apply the .groupby() method to the dataset data;
  • Within the .groupby() method, put the columns 'AirportFrom' and 'AirportTo'; the order is crucial;
  • Using the .agg() method, calculate the aggregated values: the average and maximum value in the column 'Time', and the median value of the column 'Length'.

Перейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів

Все було зрозуміло?

Секція 4. Розділ 4
toggle bottom row

Advanced Grouping

Let's expand our knowledge on the .groupby() method. As you remember, we can use the .agg() method. Indeed, the main pros of this function are that we can apply a different function to the numerical columns with one group key. Look at the example where we grouped flights by the column 'Airline', then counted the values in 'Delay' for each 'Airline', and calculated the minimum and maximum values for the 'Length' column. So convenient, isn't it?

1234
import pandas as pd data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane', index_col = 0) data_flights = data.groupby('Airline').agg({'Delay': 'count', 'Length': ['min', 'max']}) print(data_flights.head(10))
copy

Explanation:

  • .agg() - a method that allows us to apply functions on a series or on each element separately;
  • {} - we use curly brackets to specify the column and apply functions to them directly;
  • 'Delay': 'count' - applies the .count() function to the values in the 'Delay' column having the same key group;
  • 'Length': ['min', 'max'] - applies the .min() and .max() functions to the values in the 'Length' column having the same key group. You just need to put just the column name without the () or . symbols in the function. Pay attention; if you want to apply several functions to the same column, you must put them into the list.

Завдання

We can assume that a delay depends on the airline or the airport, but let's dive deeper and look at the average and maximum delay times depending on the airport from which the flight started and then on the airport at which the flight ended. Also, look at the median length of the flight. Follow the algorithm:

Group data:

  • Apply the .groupby() method to the dataset data;
  • Within the .groupby() method, put the columns 'AirportFrom' and 'AirportTo'; the order is crucial;
  • Using the .agg() method, calculate the aggregated values: the average and maximum value in the column 'Time', and the median value of the column 'Length'.

Завдання

We can assume that a delay depends on the airline or the airport, but let's dive deeper and look at the average and maximum delay times depending on the airport from which the flight started and then on the airport at which the flight ended. Also, look at the median length of the flight. Follow the algorithm:

Group data:

  • Apply the .groupby() method to the dataset data;
  • Within the .groupby() method, put the columns 'AirportFrom' and 'AirportTo'; the order is crucial;
  • Using the .agg() method, calculate the aggregated values: the average and maximum value in the column 'Time', and the median value of the column 'Length'.

Перейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів

Все було зрозуміло?

Секція 4. Розділ 4
toggle bottom row

Advanced Grouping

Let's expand our knowledge on the .groupby() method. As you remember, we can use the .agg() method. Indeed, the main pros of this function are that we can apply a different function to the numerical columns with one group key. Look at the example where we grouped flights by the column 'Airline', then counted the values in 'Delay' for each 'Airline', and calculated the minimum and maximum values for the 'Length' column. So convenient, isn't it?

1234
import pandas as pd data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane', index_col = 0) data_flights = data.groupby('Airline').agg({'Delay': 'count', 'Length': ['min', 'max']}) print(data_flights.head(10))
copy

Explanation:

  • .agg() - a method that allows us to apply functions on a series or on each element separately;
  • {} - we use curly brackets to specify the column and apply functions to them directly;
  • 'Delay': 'count' - applies the .count() function to the values in the 'Delay' column having the same key group;
  • 'Length': ['min', 'max'] - applies the .min() and .max() functions to the values in the 'Length' column having the same key group. You just need to put just the column name without the () or . symbols in the function. Pay attention; if you want to apply several functions to the same column, you must put them into the list.

Завдання

We can assume that a delay depends on the airline or the airport, but let's dive deeper and look at the average and maximum delay times depending on the airport from which the flight started and then on the airport at which the flight ended. Also, look at the median length of the flight. Follow the algorithm:

Group data:

  • Apply the .groupby() method to the dataset data;
  • Within the .groupby() method, put the columns 'AirportFrom' and 'AirportTo'; the order is crucial;
  • Using the .agg() method, calculate the aggregated values: the average and maximum value in the column 'Time', and the median value of the column 'Length'.

Завдання

We can assume that a delay depends on the airline or the airport, but let's dive deeper and look at the average and maximum delay times depending on the airport from which the flight started and then on the airport at which the flight ended. Also, look at the median length of the flight. Follow the algorithm:

Group data:

  • Apply the .groupby() method to the dataset data;
  • Within the .groupby() method, put the columns 'AirportFrom' and 'AirportTo'; the order is crucial;
  • Using the .agg() method, calculate the aggregated values: the average and maximum value in the column 'Time', and the median value of the column 'Length'.

Перейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів

Все було зрозуміло?

Let's expand our knowledge on the .groupby() method. As you remember, we can use the .agg() method. Indeed, the main pros of this function are that we can apply a different function to the numerical columns with one group key. Look at the example where we grouped flights by the column 'Airline', then counted the values in 'Delay' for each 'Airline', and calculated the minimum and maximum values for the 'Length' column. So convenient, isn't it?

1234
import pandas as pd data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane', index_col = 0) data_flights = data.groupby('Airline').agg({'Delay': 'count', 'Length': ['min', 'max']}) print(data_flights.head(10))
copy

Explanation:

  • .agg() - a method that allows us to apply functions on a series or on each element separately;
  • {} - we use curly brackets to specify the column and apply functions to them directly;
  • 'Delay': 'count' - applies the .count() function to the values in the 'Delay' column having the same key group;
  • 'Length': ['min', 'max'] - applies the .min() and .max() functions to the values in the 'Length' column having the same key group. You just need to put just the column name without the () or . symbols in the function. Pay attention; if you want to apply several functions to the same column, you must put them into the list.

Завдання

We can assume that a delay depends on the airline or the airport, but let's dive deeper and look at the average and maximum delay times depending on the airport from which the flight started and then on the airport at which the flight ended. Also, look at the median length of the flight. Follow the algorithm:

Group data:

  • Apply the .groupby() method to the dataset data;
  • Within the .groupby() method, put the columns 'AirportFrom' and 'AirportTo'; the order is crucial;
  • Using the .agg() method, calculate the aggregated values: the average and maximum value in the column 'Time', and the median value of the column 'Length'.

Перейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Секція 4. Розділ 4
Перейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
We're sorry to hear that something went wrong. What happened?
some-alt