Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Collaborative Filtering with ALS | Section
Machine Learning with PySpark

Collaborative Filtering with ALS

Sveip for å vise menyen

Alternating Least Squares (ALS) factorizes the user-item ratings matrix into two smaller matrices – one for users and one for items. Each user and item is represented as a dense vector of latent factors. The dot product of a user vector and an item vector predicts the rating.

Setting Up a Ratings Dataset

The flights dataset does not have ratings. For this chapter you will use a synthetic ratings dataset to demonstrate ALS, then apply it to the MovieLens dataset in the capstone:

1234567891011121314151617181920212223242526
from pyspark.sql import SparkSession from pyspark.ml.recommendation import ALS from pyspark.sql import Row spark = SparkSession.builder \ .appName("ALS") \ .master("local[*]") \ .getOrCreate() # Synthetic user-movie ratings ratings_data = [ Row(userId=1, movieId=1, rating=5.0), Row(userId=1, movieId=2, rating=3.0), Row(userId=1, movieId=3, rating=4.0), Row(userId=2, movieId=1, rating=4.0), Row(userId=2, movieId=4, rating=5.0), Row(userId=3, movieId=2, rating=4.0), Row(userId=3, movieId=3, rating=5.0), Row(userId=3, movieId=4, rating=3.0), Row(userId=4, movieId=1, rating=3.0), Row(userId=4, movieId=3, rating=4.0), Row(userId=4, movieId=4, rating=5.0), ] ratings_df = spark.createDataFrame(ratings_data) train_df, test_df = ratings_df.randomSplit([0.8, 0.2], seed=42)

Training ALS

12345678910111213
als = ALS( userCol="userId", itemCol="movieId", ratingCol="rating", rank=10, # Number of latent factors maxIter=10, regParam=0.1, coldStartStrategy="drop" # Dropping users/items not seen during training ) model = als.fit(train_df) predictions = model.transform(test_df) predictions.show()

coldStartStrategy="drop" removes rows with users or items not present in the training set – without this, predictions for unseen entities are NaN.

Generating Top-N Recommendations

1234567
# Top 3 movie recommendations per user user_recs = model.recommendForAllUsers(3) user_recs.show(truncate=False) # Top 3 user recommendations per movie movie_recs = model.recommendForAllItems(3) movie_recs.show(truncate=False)
question mark

What does the rank parameter control in ALS?

Velg det helt riktige svaret

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 11

Spør AI

expand

Spør AI

ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Seksjon 1. Kapittel 11
some-alt