Lære Evaluating Classification Models

Sveip for å vise menyen

Accuracy alone is a poor metric for classification – if 80% of flights are on time, a model that always predicts "on time" achieves 80% accuracy without learning anything. You need metrics that capture both types of errors.

Key Metrics

Accuracy – fraction of correct predictions. Misleading for imbalanced classes;
Precision – of all flights predicted as delayed, what fraction actually were;
Recall – of all flights that were actually delayed, what fraction did the model catch;
F1 score – harmonic mean of precision and recall. Balances both;
AUC-ROC – area under the ROC curve. Measures the model's ability to distinguish classes regardless of threshold.

Evaluating with MLlib


              12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
            
import urllib.request
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, floor
from pyspark.ml import Pipeline
from pyspark.ml.feature import StringIndexer, VectorAssembler
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.evaluation import BinaryClassificationEvaluator, MulticlassClassificationEvaluator

urllib.request.urlretrieve(
    "https://staging-content-media-cdn.codefinity.com/courses/aa80ac56-0d50-49e8-9231-2c2374cd3e9d/flights.csv",
    "flights.csv"
)

spark = SparkSession.builder \
    .appName("ClassificationEval") \
    .master("local[*]") \
    .getOrCreate()

flights_df = spark.read.csv("flights.csv", header=True, inferSchema=True) \
    .fillna(0, subset=["DEPARTURE_DELAY", "ARRIVAL_DELAY", "DISTANCE", "SCHEDULED_TIME"])

flights_df = flights_df \
    .withColumn("LABEL", (col("ARRIVAL_DELAY") > 15).cast("double")) \
    .withColumn("DEPARTURE_HOUR", floor(col("SCHEDULED_DEPARTURE") / 100).cast("integer")) \
    .withColumn("IS_WEEKEND", (col("DAY_OF_WEEK") >= 6).cast("integer"))

train_df, test_df = flights_df.randomSplit([0.8, 0.2], seed=42)

indexer = StringIndexer(inputCol="AIRLINE", outputCol="AIRLINE_IDX")
assembler = VectorAssembler(
    inputCols=["DEPARTURE_DELAY", "DISTANCE", "SCHEDULED_TIME", "DEPARTURE_HOUR", "IS_WEEKEND", "AIRLINE_IDX"],
    outputCol="FEATURES"
)
rf = RandomForestClassifier(featuresCol="FEATURES", labelCol="LABEL", numTrees=20, maxDepth=5, seed=42)

pipeline = Pipeline(stages=[indexer, assembler, rf])
model = pipeline.fit(train_df)
predictions = model.transform(test_df)

# AUC-ROC
binary_evaluator = BinaryClassificationEvaluator(labelCol="LABEL", metricName="areaUnderROC")
print(f"AUC-ROC: {binary_evaluator.evaluate(predictions):.4f}")

# Accuracy, F1, Precision, Recall
multi_evaluator = MulticlassClassificationEvaluator(labelCol="LABEL", predictionCol="prediction")

for metric in ["accuracy", "f1", "weightedPrecision", "weightedRecall"]:
    multi_evaluator.setMetricName(metric)
    print(f"{metric}: {multi_evaluator.evaluate(predictions):.4f}")

Alt var klart?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 4

Spør AI

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Seksjon 1. Kapittel 4