Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Introduction to Spark SQL | Section
Data Processing with PySpark

Introduction to Spark SQL

Sveip for å vise menyen

Spark SQL lets you query DataFrames using standard SQL syntax. If you are comfortable with SQL, this is often the fastest way to express complex analytical queries – and the results are identical to using the DataFrame API.

Registering a Temporary View

To run SQL against a DataFrame, register it as a temporary view. The view exists only for the duration of the SparkSession – it is not persisted to disk.

1234567891011121314151617
import urllib.request from pyspark.sql import SparkSession urllib.request.urlretrieve( "https://staging-content-media-cdn.codefinity.com/courses/aa80ac56-0d50-49e8-9231-2c2374cd3e9d/flights.csv", "flights.csv" ) spark = SparkSession.builder \ .appName("SparkSQL") \ .master("local[*]") \ .getOrCreate() flights_df = spark.read.csv("flights.csv", header=True, inferSchema=True) # Registering the DataFrame as a temporary SQL view flights_df.createOrReplaceTempView("flights")

Running SQL Queries

12345678910
# Selecting columns with a condition result = spark.sql(""" SELECT AIRLINE, ORIGIN_AIRPORT, DESTINATION_AIRPORT, ARRIVAL_DELAY FROM flights WHERE ARRIVAL_DELAY > 60 ORDER BY ARRIVAL_DELAY DESC LIMIT 10 """) result.show()

spark.sql() returns a regular DataFrame – you can chain further DataFrame operations on the result.

SQL vs DataFrame API

Both approaches produce the same execution plan. Choose whichever is more readable for the task:

1234567
# SQL spark.sql(""" SELECT AIRLINE, COUNT(*) AS TOTAL_FLIGHTS, AVG(ARRIVAL_DELAY) AS AVG_DELAY FROM flights GROUP BY AIRLINE ORDER BY AVG_DELAY DESC """).show()
1234567
# Equivalent DataFrame API from pyspark.sql.functions import avg, count, col flights_df.groupBy("AIRLINE") \ .agg(count("*").alias("TOTAL_FLIGHTS"), avg("ARRIVAL_DELAY").alias("AVG_DELAY")) \ .orderBy(col("AVG_DELAY").desc()) \ .show()
question mark

What does createOrReplaceTempView do?

Velg det helt riktige svaret

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 5

Spør AI

expand

Spør AI

ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Seksjon 1. Kapittel 5
some-alt