Seksjon 1. Kapittel 9
single
Challenge: Customer Segmentation
Sveip for å vise menyen
Oppgave
Sveip for å begynne å kode
You are given a flights dataset as a list of rows. Load it into a DataFrame using createDataFrame and segment airlines by their operational profile using K-Means clustering. Complete all steps and store results in the specified variables:
- Fill nulls in
DelayandLengthwith0; - Aggregate by
Airlineto compute:AVG_DELAY– averageDelay;AVG_LENGTH– averageLength;TOTAL_FLIGHTS– count of flights. Store the result inairline_df;
- Build a Pipeline with
VectorAssembleron["AVG_DELAY", "AVG_LENGTH", "TOTAL_FLIGHTS"]andKMeanswithk=3,seed=42,maxIter=5– no scaling needed; - Fit the pipeline and transform
airline_df– store the result inclustered_df; - Store the number of rows per cluster as a list of tuples
[(cluster_id, count), ...]sorted bycluster_idincluster_counts.
Print cluster_counts and show clustered_df sorted by prediction.
Løsning
Alt var klart?
Takk for tilbakemeldingene dine!
Seksjon 1. Kapittel 9
single
Spør AI
Spør AI
Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår