Introduction to Spark SQL
Свайпніть щоб показати меню
Spark SQL lets you query DataFrames using standard SQL syntax. If you are comfortable with SQL, this is often the fastest way to express complex analytical queries – and the results are identical to using the DataFrame API.
Registering a Temporary View
To run SQL against a DataFrame, register it as a temporary view. The view exists only for the duration of the SparkSession – it is not persisted to disk.
1234567891011121314151617import urllib.request from pyspark.sql import SparkSession urllib.request.urlretrieve( "https://staging-content-media-cdn.codefinity.com/courses/aa80ac56-0d50-49e8-9231-2c2374cd3e9d/flights.csv", "flights.csv" ) spark = SparkSession.builder \ .appName("SparkSQL") \ .master("local[*]") \ .getOrCreate() flights_df = spark.read.csv("flights.csv", header=True, inferSchema=True) # Registering the DataFrame as a temporary SQL view flights_df.createOrReplaceTempView("flights")
Running SQL Queries
12345678910# Selecting columns with a condition result = spark.sql(""" SELECT AIRLINE, ORIGIN_AIRPORT, DESTINATION_AIRPORT, ARRIVAL_DELAY FROM flights WHERE ARRIVAL_DELAY > 60 ORDER BY ARRIVAL_DELAY DESC LIMIT 10 """) result.show()
spark.sql() returns a regular DataFrame – you can chain further DataFrame operations on the result.
SQL vs DataFrame API
Both approaches produce the same execution plan. Choose whichever is more readable for the task:
1234567# SQL spark.sql(""" SELECT AIRLINE, COUNT(*) AS TOTAL_FLIGHTS, AVG(ARRIVAL_DELAY) AS AVG_DELAY FROM flights GROUP BY AIRLINE ORDER BY AVG_DELAY DESC """).show()
1234567# Equivalent DataFrame API from pyspark.sql.functions import avg, count, col flights_df.groupBy("AIRLINE") \ .agg(count("*").alias("TOTAL_FLIGHTS"), avg("ARRIVAL_DELAY").alias("AVG_DELAY")) \ .orderBy(col("AVG_DELAY").desc()) \ .show()
Все було зрозуміло?
Дякуємо за ваш відгук!
Секція 1. Розділ 5
Запитати АІ
Запитати АІ
Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат
Секція 1. Розділ 5