Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda Introduction to Spark SQL | Section
Data Processing with PySpark

Introduction to Spark SQL

Deslize para mostrar o menu

Spark SQL lets you query DataFrames using standard SQL syntax. If you are comfortable with SQL, this is often the fastest way to express complex analytical queries – and the results are identical to using the DataFrame API.

Registering a Temporary View

To run SQL against a DataFrame, register it as a temporary view. The view exists only for the duration of the SparkSession – it is not persisted to disk.

1234567891011121314151617
import urllib.request from pyspark.sql import SparkSession urllib.request.urlretrieve( "https://staging-content-media-cdn.codefinity.com/courses/aa80ac56-0d50-49e8-9231-2c2374cd3e9d/flights.csv", "flights.csv" ) spark = SparkSession.builder \ .appName("SparkSQL") \ .master("local[*]") \ .getOrCreate() flights_df = spark.read.csv("flights.csv", header=True, inferSchema=True) # Registering the DataFrame as a temporary SQL view flights_df.createOrReplaceTempView("flights")

Running SQL Queries

12345678910
# Selecting columns with a condition result = spark.sql(""" SELECT AIRLINE, ORIGIN_AIRPORT, DESTINATION_AIRPORT, ARRIVAL_DELAY FROM flights WHERE ARRIVAL_DELAY > 60 ORDER BY ARRIVAL_DELAY DESC LIMIT 10 """) result.show()

spark.sql() returns a regular DataFrame – you can chain further DataFrame operations on the result.

SQL vs DataFrame API

Both approaches produce the same execution plan. Choose whichever is more readable for the task:

1234567
# SQL spark.sql(""" SELECT AIRLINE, COUNT(*) AS TOTAL_FLIGHTS, AVG(ARRIVAL_DELAY) AS AVG_DELAY FROM flights GROUP BY AIRLINE ORDER BY AVG_DELAY DESC """).show()
1234567
# Equivalent DataFrame API from pyspark.sql.functions import avg, count, col flights_df.groupBy("AIRLINE") \ .agg(count("*").alias("TOTAL_FLIGHTS"), avg("ARRIVAL_DELAY").alias("AVG_DELAY")) \ .orderBy(col("AVG_DELAY").desc()) \ .show()
question mark

What does createOrReplaceTempView do?

Selecione a resposta correta

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 1. Capítulo 5

Pergunte à IA

expand

Pergunte à IA

ChatGPT

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

Seção 1. Capítulo 5
some-alt