Selecting, Filtering, and Renaming Columns
Свайпніть щоб показати меню
The three most common DataFrame operations are selecting the columns you need, filtering rows by condition, and renaming columns for clarity. These map directly to SQL SELECT, WHERE, and AS.
Selecting Columns
1234567891011121314151617import urllib.request from pyspark.sql import SparkSession urllib.request.urlretrieve( "https://staging-content-media-cdn.codefinity.com/courses/aa80ac56-0d50-49e8-9231-2c2374cd3e9d/flights.csv", "flights.csv" ) spark = SparkSession.builder \ .appName("SelectFilterRename") \ .master("local[*]") \ .getOrCreate() flights_df = spark.read.csv("flights.csv", header=True, inferSchema=True) # Selecting specific columns flights_df.select("AIRLINE", "ORIGIN_AIRPORT", "DESTINATION_AIRPORT", "DEPARTURE_DELAY").show(5)
Filtering Rows
1234567891011from pyspark.sql.functions import col # Filtering flights with arrival delay over 60 minutes delayed_df = flights_df.filter(col("ARRIVAL_DELAY") > 60) # Combining conditions long_delayed_df = flights_df.filter( (col("ARRIVAL_DELAY") > 60) & (col("DISTANCE") > 1000) ) long_delayed_df.select("AIRLINE", "ORIGIN_AIRPORT", "DESTINATION_AIRPORT", "ARRIVAL_DELAY", "DISTANCE").show(5)
Use col() from pyspark.sql.functions for conditions – it is more explicit and avoids ambiguity when working with multiple DataFrames.
Renaming Columns
1234567# Renaming columns for readability renamed_df = flights_df \ .withColumnRenamed("ORIGIN_AIRPORT", "FROM") \ .withColumnRenamed("DESTINATION_AIRPORT", "TO") \ .withColumnRenamed("ARRIVAL_DELAY", "DELAY_MINUTES") renamed_df.select("AIRLINE", "FROM", "TO", "DELAY_MINUTES").show(5)
Adding a Derived Column
123456789from pyspark.sql.functions import col # Adding a boolean column indicating whether the flight was significantly delayed flights_df = flights_df.withColumn( "IS_DELAYED", col("ARRIVAL_DELAY") > 15 ) flights_df.select("AIRLINE", "ARRIVAL_DELAY", "IS_DELAYED").show(5)
Все було зрозуміло?
Дякуємо за ваш відгук!
Секція 1. Розділ 9
Запитати АІ
Запитати АІ
Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат
Секція 1. Розділ 9