Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Leer Collaborative Filtering with ALS | Section
Machine Learning with PySpark

Collaborative Filtering with ALS

Veeg om het menu te tonen

Alternating Least Squares (ALS) factorizes the user-item ratings matrix into two smaller matrices – one for users and one for items. Each user and item is represented as a dense vector of latent factors. The dot product of a user vector and an item vector predicts the rating.

Setting Up a Ratings Dataset

The flights dataset does not have ratings. For this chapter you will use a synthetic ratings dataset to demonstrate ALS, then apply it to the MovieLens dataset in the capstone:

1234567891011121314151617181920212223242526
from pyspark.sql import SparkSession from pyspark.ml.recommendation import ALS from pyspark.sql import Row spark = SparkSession.builder \ .appName("ALS") \ .master("local[*]") \ .getOrCreate() # Synthetic user-movie ratings ratings_data = [ Row(userId=1, movieId=1, rating=5.0), Row(userId=1, movieId=2, rating=3.0), Row(userId=1, movieId=3, rating=4.0), Row(userId=2, movieId=1, rating=4.0), Row(userId=2, movieId=4, rating=5.0), Row(userId=3, movieId=2, rating=4.0), Row(userId=3, movieId=3, rating=5.0), Row(userId=3, movieId=4, rating=3.0), Row(userId=4, movieId=1, rating=3.0), Row(userId=4, movieId=3, rating=4.0), Row(userId=4, movieId=4, rating=5.0), ] ratings_df = spark.createDataFrame(ratings_data) train_df, test_df = ratings_df.randomSplit([0.8, 0.2], seed=42)

Training ALS

12345678910111213
als = ALS( userCol="userId", itemCol="movieId", ratingCol="rating", rank=10, # Number of latent factors maxIter=10, regParam=0.1, coldStartStrategy="drop" # Dropping users/items not seen during training ) model = als.fit(train_df) predictions = model.transform(test_df) predictions.show()

coldStartStrategy="drop" removes rows with users or items not present in the training set – without this, predictions for unseen entities are NaN.

Generating Top-N Recommendations

1234567
# Top 3 movie recommendations per user user_recs = model.recommendForAllUsers(3) user_recs.show(truncate=False) # Top 3 user recommendations per movie movie_recs = model.recommendForAllItems(3) movie_recs.show(truncate=False)
question mark

What does the rank parameter control in ALS?

Selecteer het correcte antwoord

Was alles duidelijk?

Hoe kunnen we het verbeteren?

Bedankt voor je feedback!

Sectie 1. Hoofdstuk 11

Vraag AI

expand

Vraag AI

ChatGPT

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

Sectie 1. Hoofdstuk 11
some-alt