Casting and Converting Data Types
Swipe um das Menü anzuzeigen
inferSchema=True does a good job for most columns, but it sometimes gets types wrong – or you may need to convert columns for downstream processing. PySpark provides explicit casting and conversion functions for this.
Checking and Casting Types
123456789101112131415161718192021222324252627282930313233import urllib.request from pyspark.sql import SparkSession from pyspark.sql.functions import col from pyspark.sql.types import IntegerType, FloatType, StringType urllib.request.urlretrieve( "https://staging-content-media-cdn.codefinity.com/courses/aa80ac56-0d50-49e8-9231-2c2374cd3e9d/flights.csv", "flights.csv" ) spark = SparkSession.builder \ .appName("TypeCasting") \ .master("local[*]") \ .getOrCreate() flights_df = spark.read.csv("flights.csv", header=True, inferSchema=True) # Checking current types flights_df.printSchema() # Casting DEPARTURE_DELAY from float to integer flights_df = flights_df.withColumn( "DEPARTURE_DELAY", col("DEPARTURE_DELAY").cast(IntegerType()) ) # Casting FLIGHT_NUMBER to string flights_df = flights_df.withColumn( "FLIGHT_NUMBER", col("FLIGHT_NUMBER").cast(StringType()) ) flights_df.select("DEPARTURE_DELAY", "FLIGHT_NUMBER").printSchema()
If a value cannot be cast – for example, casting "ABC" to IntegerType – PySpark replaces it with null instead of raising an error.
Building Datetime Columns
The flights dataset stores date components as separate integers. You can combine them into a proper date column:
12345678910111213141516from pyspark.sql.functions import lpad, concat_ws, to_date # Constructing a date string and converting to DateType flights_df = flights_df.withColumn( "FLIGHT_DATE", to_date( concat_ws("-", col("YEAR").cast(StringType()), lpad(col("MONTH").cast(StringType()), 2, "0"), lpad(col("DAY").cast(StringType()), 2, "0") ), "yyyy-MM-dd" ) ) flights_df.select("YEAR", "MONTH", "DAY", "FLIGHT_DATE").show(5)
lpad left-pads single-digit months and days with "0" so the format matches yyyy-MM-dd.
War alles klar?
Danke für Ihr Feedback!
Abschnitt 1. Kapitel 3
Fragen Sie AI
Fragen Sie AI
Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen
Abschnitt 1. Kapitel 3