Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele Introduction to Spark SQL | Section
Data Processing with PySpark

Introduction to Spark SQL

Pyyhkäise näyttääksesi valikon

Spark SQL lets you query DataFrames using standard SQL syntax. If you are comfortable with SQL, this is often the fastest way to express complex analytical queries – and the results are identical to using the DataFrame API.

Registering a Temporary View

To run SQL against a DataFrame, register it as a temporary view. The view exists only for the duration of the SparkSession – it is not persisted to disk.

1234567891011121314151617
import urllib.request from pyspark.sql import SparkSession urllib.request.urlretrieve( "https://staging-content-media-cdn.codefinity.com/courses/aa80ac56-0d50-49e8-9231-2c2374cd3e9d/flights.csv", "flights.csv" ) spark = SparkSession.builder \ .appName("SparkSQL") \ .master("local[*]") \ .getOrCreate() flights_df = spark.read.csv("flights.csv", header=True, inferSchema=True) # Registering the DataFrame as a temporary SQL view flights_df.createOrReplaceTempView("flights")

Running SQL Queries

12345678910
# Selecting columns with a condition result = spark.sql(""" SELECT AIRLINE, ORIGIN_AIRPORT, DESTINATION_AIRPORT, ARRIVAL_DELAY FROM flights WHERE ARRIVAL_DELAY > 60 ORDER BY ARRIVAL_DELAY DESC LIMIT 10 """) result.show()

spark.sql() returns a regular DataFrame – you can chain further DataFrame operations on the result.

SQL vs DataFrame API

Both approaches produce the same execution plan. Choose whichever is more readable for the task:

1234567
# SQL spark.sql(""" SELECT AIRLINE, COUNT(*) AS TOTAL_FLIGHTS, AVG(ARRIVAL_DELAY) AS AVG_DELAY FROM flights GROUP BY AIRLINE ORDER BY AVG_DELAY DESC """).show()
1234567
# Equivalent DataFrame API from pyspark.sql.functions import avg, count, col flights_df.groupBy("AIRLINE") \ .agg(count("*").alias("TOTAL_FLIGHTS"), avg("ARRIVAL_DELAY").alias("AVG_DELAY")) \ .orderBy(col("AVG_DELAY").desc()) \ .show()
question mark

What does createOrReplaceTempView do?

Valitse oikea vastaus

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 1. Luku 5

Kysy tekoälyä

expand

Kysy tekoälyä

ChatGPT

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

Osio 1. Luku 5
some-alt