Joins in PySpark: Inner, Left, and Anti
メニューを表示するにはスワイプしてください
Joins combine two DataFrames on a common key. PySpark supports all standard join types – the most useful for analytical work are inner, left, and anti.
Setting Up Two DataFrames
To demonstrate joins, create a small airline names DataFrame alongside the flights data:
123456789101112131415161718192021222324import urllib.request from pyspark.sql import SparkSession from pyspark.sql import Row urllib.request.urlretrieve( "https://staging-content-media-cdn.codefinity.com/courses/aa80ac56-0d50-49e8-9231-2c2374cd3e9d/flights.csv", "flights.csv" ) spark = SparkSession.builder \ .appName("Joins") \ .master("local[*]") \ .getOrCreate() flights_df = spark.read.csv("flights.csv", header=True, inferSchema=True) # Small reference DataFrame with full airline names airlines_df = spark.createDataFrame([ Row(IATA="AA", NAME="American Airlines"), Row(IATA="DL", NAME="Delta Air Lines"), Row(IATA="UA", NAME="United Airlines"), Row(IATA="WN", NAME="Southwest Airlines"), Row(IATA="AS", NAME="Alaska Airlines"), ])
Inner Join
Returns only rows where the key exists in both DataFrames:
123# Joining flights with airline names inner_df = flights_df.join(airlines_df, flights_df["AIRLINE"] == airlines_df["IATA"], "inner") inner_df.select("AIRLINE", "NAME", "ORIGIN_AIRPORT", "ARRIVAL_DELAY").show(5)
Left Join
Returns all rows from the left DataFrame, with nulls for unmatched rows from the right:
12left_df = flights_df.join(airlines_df, flights_df["AIRLINE"] == airlines_df["IATA"], "left") left_df.select("AIRLINE", "NAME", "ORIGIN_AIRPORT").show(5)
Anti Join
Returns rows from the left DataFrame that have no match in the right – useful for finding orphaned records:
123# Flights whose airline code is not in the reference table unmatched_df = flights_df.join(airlines_df, flights_df["AIRLINE"] == airlines_df["IATA"], "left_anti") print(unmatched_df.select("AIRLINE").distinct().collect())
すべて明確でしたか?
フィードバックありがとうございます!
セクション 1. 章 7
AIに質問する
AIに質問する
何でも質問するか、提案された質問の1つを試してチャットを始めてください
セクション 1. 章 7