Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Running SQL Queries on DataFrames | Section
Data Processing with PySpark

Running SQL Queries on DataFrames

Свайпніть щоб показати меню

With a temporary view registered, you can write full SQL – subqueries, aggregations, window functions, and CASE WHEN expressions – exactly as you would in a relational database.

Aggregations and Grouping

123456789101112131415161718192021222324252627
import urllib.request from pyspark.sql import SparkSession urllib.request.urlretrieve( "https://staging-content-media-cdn.codefinity.com/courses/aa80ac56-0d50-49e8-9231-2c2374cd3e9d/flights.csv", "flights.csv" ) spark = SparkSession.builder \ .appName("SQLQueries") \ .master("local[*]") \ .getOrCreate() flights_df = spark.read.csv("flights.csv", header=True, inferSchema=True) flights_df.createOrReplaceTempView("flights") # Flights per airline with average delay – only airlines with more than 5000 flights spark.sql(""" SELECT AIRLINE, COUNT(*) AS TOTAL_FLIGHTS, ROUND(AVG(ARRIVAL_DELAY), 2) AS AVG_DELAY FROM flights GROUP BY AIRLINE HAVING COUNT(*) > 5000 ORDER BY AVG_DELAY DESC """).show()

CASE WHEN

123456789101112131415
# Classifying flights by delay severity spark.sql(""" SELECT AIRLINE, ARRIVAL_DELAY, CASE WHEN ARRIVAL_DELAY <= 0 THEN 'On time' WHEN ARRIVAL_DELAY <= 15 THEN 'Minor delay' WHEN ARRIVAL_DELAY <= 60 THEN 'Moderate delay' ELSE 'Major delay' END AS DELAY_CATEGORY FROM flights WHERE ARRIVAL_DELAY IS NOT NULL LIMIT 20 """).show()

Subqueries

12345678910
# Finding airlines whose average delay exceeds the overall average spark.sql(""" SELECT AIRLINE, ROUND(AVG(ARRIVAL_DELAY), 2) AS AVG_DELAY FROM flights GROUP BY AIRLINE HAVING AVG(ARRIVAL_DELAY) > ( SELECT AVG(ARRIVAL_DELAY) FROM flights ) ORDER BY AVG_DELAY DESC """).show()

Run this and experiment with adding a WHERE CANCELLED = 0 clause to exclude cancelled flights from the delay averages.

question mark

Which SQL clause filters on an aggregated value?

Виберіть правильну відповідь

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 1. Розділ 6

Запитати АІ

expand

Запитати АІ

ChatGPT

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Секція 1. Розділ 6
some-alt