ALS¶

class
pyspark.mllib.recommendation.
ALS
[source]¶ Alternating Least Squares matrix factorization
New in version 0.9.0.
Methods
train
(ratings, rank[, iterations, lambda_, …])Train a matrix factorization model given an RDD of ratings by users for a subset of products.
trainImplicit
(ratings, rank[, iterations, …])Train a matrix factorization model given an RDD of ‘implicit preferences’ of users for a subset of products.
Methods Documentation

classmethod
train
(ratings: Union[pyspark.rdd.RDD[pyspark.mllib.recommendation.Rating], pyspark.rdd.RDD[Tuple[int, int, float]]], rank: int, iterations: int = 5, lambda_: float = 0.01, blocks: int =  1, nonnegative: bool = False, seed: Optional[int] = None) → pyspark.mllib.recommendation.MatrixFactorizationModel[source]¶ Train a matrix factorization model given an RDD of ratings by users for a subset of products. The ratings matrix is approximated as the product of two lowerrank matrices of a given rank (number of features). To solve for these features, ALS is run iteratively with a configurable level of parallelism.
New in version 0.9.0.
 Parameters
 ratings
pyspark.RDD
RDD of Rating or (userID, productID, rating) tuple.
 rankint
Number of features to use (also referred to as the number of latent factors).
 iterationsint, optional
Number of iterations of ALS. (default: 5)
 lambda_float, optional
Regularization parameter. (default: 0.01)
 blocksint, optional
Number of blocks used to parallelize the computation. A value of 1 will use an autoconfigured number of blocks. (default: 1)
 nonnegativebool, optional
A value of True will solve leastsquares with nonnegativity constraints. (default: False)
 seedbool, optional
Random seed for initial matrix factorization model. A value of None will use system time as the seed. (default: None)
 ratings

classmethod
trainImplicit
(ratings: Union[pyspark.rdd.RDD[pyspark.mllib.recommendation.Rating], pyspark.rdd.RDD[Tuple[int, int, float]]], rank: int, iterations: int = 5, lambda_: float = 0.01, blocks: int =  1, alpha: float = 0.01, nonnegative: bool = False, seed: Optional[int] = None) → pyspark.mllib.recommendation.MatrixFactorizationModel[source]¶ Train a matrix factorization model given an RDD of ‘implicit preferences’ of users for a subset of products. The ratings matrix is approximated as the product of two lowerrank matrices of a given rank (number of features). To solve for these features, ALS is run iteratively with a configurable level of parallelism.
New in version 0.9.0.
 Parameters
 ratings
pyspark.RDD
RDD of Rating or (userID, productID, rating) tuple.
 rankint
Number of features to use (also referred to as the number of latent factors).
 iterationsint, optional
Number of iterations of ALS. (default: 5)
 lambda_float, optional
Regularization parameter. (default: 0.01)
 blocksint, optional
Number of blocks used to parallelize the computation. A value of 1 will use an autoconfigured number of blocks. (default: 1)
 alphafloat, optional
A constant used in computing confidence. (default: 0.01)
 nonnegativebool, optional
A value of True will solve leastsquares with nonnegativity constraints. (default: False)
 seedint, optional
Random seed for initial matrix factorization model. A value of None will use system time as the seed. (default: None)
 ratings

classmethod