Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Challenge: Exploratory Analysis of a Flights Dataset | Section
Introduction to PySpark
Section 1. Chapter 11
single

single

Challenge: Exploratory Analysis of a Flights Dataset

Swipe to show menu

Task

Swipe to start coding

You are given a flights dataset as a list of rows. Load it into a DataFrame using createDataFrame and complete the following steps, storing results in the specified variables:

  1. Count the total number of rows – store in total_rows;
  2. Find the airline with the most delayed flights (Delay == 1) – store the airline code as a string in most_delayed_airline;
  3. Count the number of delayed flights (Delay == 1) – store in delayed_count;
  4. Find the top 3 busiest routes (unique AirportFrom + AirportTo pairs by flight count) – store as a list of tuples [(origin, destination, count), ...] in top_routes.

Print all results.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 1. Chapter 11
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

some-alt