Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele Collaborative Filtering with ALS | Section
Machine Learning with PySpark

Collaborative Filtering with ALS

Pyyhkäise näyttääksesi valikon

Alternating Least Squares (ALS) factorizes the user-item ratings matrix into two smaller matrices – one for users and one for items. Each user and item is represented as a dense vector of latent factors. The dot product of a user vector and an item vector predicts the rating.

Setting Up a Ratings Dataset

The flights dataset does not have ratings. For this chapter you will use a synthetic ratings dataset to demonstrate ALS, then apply it to the MovieLens dataset in the capstone:

1234567891011121314151617181920212223242526
from pyspark.sql import SparkSession from pyspark.ml.recommendation import ALS from pyspark.sql import Row spark = SparkSession.builder \ .appName("ALS") \ .master("local[*]") \ .getOrCreate() # Synthetic user-movie ratings ratings_data = [ Row(userId=1, movieId=1, rating=5.0), Row(userId=1, movieId=2, rating=3.0), Row(userId=1, movieId=3, rating=4.0), Row(userId=2, movieId=1, rating=4.0), Row(userId=2, movieId=4, rating=5.0), Row(userId=3, movieId=2, rating=4.0), Row(userId=3, movieId=3, rating=5.0), Row(userId=3, movieId=4, rating=3.0), Row(userId=4, movieId=1, rating=3.0), Row(userId=4, movieId=3, rating=4.0), Row(userId=4, movieId=4, rating=5.0), ] ratings_df = spark.createDataFrame(ratings_data) train_df, test_df = ratings_df.randomSplit([0.8, 0.2], seed=42)

Training ALS

12345678910111213
als = ALS( userCol="userId", itemCol="movieId", ratingCol="rating", rank=10, # Number of latent factors maxIter=10, regParam=0.1, coldStartStrategy="drop" # Dropping users/items not seen during training ) model = als.fit(train_df) predictions = model.transform(test_df) predictions.show()

coldStartStrategy="drop" removes rows with users or items not present in the training set – without this, predictions for unseen entities are NaN.

Generating Top-N Recommendations

1234567
# Top 3 movie recommendations per user user_recs = model.recommendForAllUsers(3) user_recs.show(truncate=False) # Top 3 user recommendations per movie movie_recs = model.recommendForAllItems(3) movie_recs.show(truncate=False)
question mark

What does the rank parameter control in ALS?

Valitse oikea vastaus

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 1. Luku 11

Kysy tekoälyä

expand

Kysy tekoälyä

ChatGPT

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

Osio 1. Luku 11
some-alt