Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprende Collaborative Filtering with ALS | Section
Machine Learning with PySpark

Collaborative Filtering with ALS

Desliza para mostrar el menú

Alternating Least Squares (ALS) factorizes the user-item ratings matrix into two smaller matrices – one for users and one for items. Each user and item is represented as a dense vector of latent factors. The dot product of a user vector and an item vector predicts the rating.

Setting Up a Ratings Dataset

The flights dataset does not have ratings. For this chapter you will use a synthetic ratings dataset to demonstrate ALS, then apply it to the MovieLens dataset in the capstone:

1234567891011121314151617181920212223242526
from pyspark.sql import SparkSession from pyspark.ml.recommendation import ALS from pyspark.sql import Row spark = SparkSession.builder \ .appName("ALS") \ .master("local[*]") \ .getOrCreate() # Synthetic user-movie ratings ratings_data = [ Row(userId=1, movieId=1, rating=5.0), Row(userId=1, movieId=2, rating=3.0), Row(userId=1, movieId=3, rating=4.0), Row(userId=2, movieId=1, rating=4.0), Row(userId=2, movieId=4, rating=5.0), Row(userId=3, movieId=2, rating=4.0), Row(userId=3, movieId=3, rating=5.0), Row(userId=3, movieId=4, rating=3.0), Row(userId=4, movieId=1, rating=3.0), Row(userId=4, movieId=3, rating=4.0), Row(userId=4, movieId=4, rating=5.0), ] ratings_df = spark.createDataFrame(ratings_data) train_df, test_df = ratings_df.randomSplit([0.8, 0.2], seed=42)

Training ALS

12345678910111213
als = ALS( userCol="userId", itemCol="movieId", ratingCol="rating", rank=10, # Number of latent factors maxIter=10, regParam=0.1, coldStartStrategy="drop" # Dropping users/items not seen during training ) model = als.fit(train_df) predictions = model.transform(test_df) predictions.show()

coldStartStrategy="drop" removes rows with users or items not present in the training set – without this, predictions for unseen entities are NaN.

Generating Top-N Recommendations

1234567
# Top 3 movie recommendations per user user_recs = model.recommendForAllUsers(3) user_recs.show(truncate=False) # Top 3 user recommendations per movie movie_recs = model.recommendForAllItems(3) movie_recs.show(truncate=False)
question mark

What does the rank parameter control in ALS?

Selecciona la respuesta correcta

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 1. Capítulo 11

Pregunte a AI

expand

Pregunte a AI

ChatGPT

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

Sección 1. Capítulo 11
some-alt