Creating and Transforming RDDs
Swipe um das Menü anzuzeigen
You can create an RDD from an existing DataFrame, from a Python collection, or directly from a file. Once you have one, you chain transformations to shape the data before triggering an action.
Creating RDDs
1234567891011121314151617181920import urllib.request from pyspark.sql import SparkSession # Downloading the dataset locally urllib.request.urlretrieve( "https://staging-content-media-cdn.codefinity.com/courses/aa80ac56-0d50-49e8-9231-2c2374cd3e9d/flights.csv", "flights.csv" ) spark = SparkSession.builder \ .appName("RDDTransformations") \ .master("local[*]") \ .getOrCreate() # From a Python list numbers_rdd = spark.sparkContext.parallelize([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) # From a DataFrame (dataset already downloaded in previous chapter) flights_df = spark.read.csv("flights.csv", header=True, inferSchema=True) flights_rdd = flights_df.rdd
Common Transformations
123456789101112# map – applying a function to every element distances_rdd = flights_rdd.map(lambda row: row["DISTANCE"]) # filter – keeping only elements that match a condition long_flights_rdd = flights_rdd.filter(lambda row: row["DISTANCE"] > 1000) # flatMap – like map but flattens the result # Splitting airline codes into individual characters (illustrative) chars_rdd = flights_rdd.flatMap(lambda row: list(row["AIRLINE"])) # distinct – removing duplicates unique_airlines_rdd = flights_rdd.map(lambda row: row["AIRLINE"]).distinct()
Chaining Transformations and Triggering Actions
12345678# Finding unique airlines operating flights longer than 1000 miles result = flights_rdd \ .filter(lambda row: row["DISTANCE"] > 1000) \ .map(lambda row: row["AIRLINE"]) \ .distinct() \ .collect() print(sorted(result))
collect() returns all results to the driver as a Python list. Use it only when the result is small – calling collect() on a multi-gigabyte RDD will crash the driver.
For large results, use take(n) to retrieve only the first n elements, or write the output to disk.
War alles klar?
Danke für Ihr Feedback!
Abschnitt 1. Kapitel 5
Fragen Sie AI
Fragen Sie AI
Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen
Abschnitt 1. Kapitel 5