Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Collaborative Filtering with ALS | Section
Machine Learning with PySpark

Collaborative Filtering with ALS

Stryg for at vise menuen

Alternating Least Squares (ALS) factorizes the user-item ratings matrix into two smaller matrices – one for users and one for items. Each user and item is represented as a dense vector of latent factors. The dot product of a user vector and an item vector predicts the rating.

Setting Up a Ratings Dataset

The flights dataset does not have ratings. For this chapter you will use a synthetic ratings dataset to demonstrate ALS, then apply it to the MovieLens dataset in the capstone:

1234567891011121314151617181920212223242526
from pyspark.sql import SparkSession from pyspark.ml.recommendation import ALS from pyspark.sql import Row spark = SparkSession.builder \ .appName("ALS") \ .master("local[*]") \ .getOrCreate() # Synthetic user-movie ratings ratings_data = [ Row(userId=1, movieId=1, rating=5.0), Row(userId=1, movieId=2, rating=3.0), Row(userId=1, movieId=3, rating=4.0), Row(userId=2, movieId=1, rating=4.0), Row(userId=2, movieId=4, rating=5.0), Row(userId=3, movieId=2, rating=4.0), Row(userId=3, movieId=3, rating=5.0), Row(userId=3, movieId=4, rating=3.0), Row(userId=4, movieId=1, rating=3.0), Row(userId=4, movieId=3, rating=4.0), Row(userId=4, movieId=4, rating=5.0), ] ratings_df = spark.createDataFrame(ratings_data) train_df, test_df = ratings_df.randomSplit([0.8, 0.2], seed=42)

Training ALS

12345678910111213
als = ALS( userCol="userId", itemCol="movieId", ratingCol="rating", rank=10, # Number of latent factors maxIter=10, regParam=0.1, coldStartStrategy="drop" # Dropping users/items not seen during training ) model = als.fit(train_df) predictions = model.transform(test_df) predictions.show()

coldStartStrategy="drop" removes rows with users or items not present in the training set – without this, predictions for unseen entities are NaN.

Generating Top-N Recommendations

1234567
# Top 3 movie recommendations per user user_recs = model.recommendForAllUsers(3) user_recs.show(truncate=False) # Top 3 user recommendations per movie movie_recs = model.recommendForAllItems(3) movie_recs.show(truncate=False)
question mark

What does the rank parameter control in ALS?

Vælg det korrekte svar

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 1. Kapitel 11

Spørg AI

expand

Spørg AI

ChatGPT

Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat

Sektion 1. Kapitel 11
some-alt