Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele Evaluating a Recommendation Engine | Section
Machine Learning with PySpark

Evaluating a Recommendation Engine

Pyyhkäise näyttääksesi valikon

Evaluating recommendation systems is harder than classification or regression. You care not just about prediction accuracy but also about the quality of the ranked list of recommendations.

RMSE for Explicit Ratings

For explicit ratings (like 1–5 stars), RMSE measures how far predicted ratings are from actual ratings:

123456789101112131415161718192021222324252627282930313233
from pyspark.sql import SparkSession from pyspark.ml.recommendation import ALS from pyspark.ml.evaluation import RegressionEvaluator from pyspark.sql import Row spark = SparkSession.builder \ .appName("RecEval") \ .master("local[*]") \ .getOrCreate() ratings_data = [ Row(userId=1, movieId=1, rating=5.0), Row(userId=1, movieId=2, rating=3.0), Row(userId=2, movieId=1, rating=4.0), Row(userId=2, movieId=3, rating=5.0), Row(userId=3, movieId=2, rating=4.0), Row(userId=3, movieId=3, rating=5.0), Row(userId=4, movieId=1, rating=3.0), Row(userId=4, movieId=3, rating=4.0), ] ratings_df = spark.createDataFrame(ratings_data) train_df, test_df = ratings_df.randomSplit([0.8, 0.2], seed=42) als = ALS(userCol="userId", itemCol="movieId", ratingCol="rating", rank=5, maxIter=10, regParam=0.1, coldStartStrategy="drop") model = als.fit(train_df) predictions = model.transform(test_df) evaluator = RegressionEvaluator(metricName="rmse", labelCol="rating", predictionCol="prediction") rmse = evaluator.evaluate(predictions) print(f"RMSE: {rmse:.4f}")

Ranking Metrics

RMSE tells you how accurate the predicted scores are, but not whether the right items appear at the top of the recommendation list. Two additional metrics matter:

  • Precision@K – of the top K recommended items, what fraction did the user actually like;
  • Recall@K – of all items the user liked, what fraction appeared in the top K recommendations.

These require comparing the ranked recommendation list against held-out interactions, which is typically computed outside of MLlib using custom logic or libraries like ranking metrics in Spark.

A Note on Cold Start

The cold start problem occurs when a new user or item has no interaction history. ALS cannot generate meaningful recommendations for them. Common solutions are:

  • falling back to popularity-based recommendations for new users;
  • collecting a small number of initial ratings before generating personalized recommendations.

Run this locally and experiment with different rank and regParam values to see how they affect RMSE on the test set.

question mark

What does RMSE measure in the context of a recommendation system with explicit ratings?

Valitse oikea vastaus

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 1. Luku 12

Kysy tekoälyä

expand

Kysy tekoälyä

ChatGPT

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

Osio 1. Luku 12
some-alt