FPGrowth

class pyspark.mllib.fpm.FPGrowth[source]

A Parallel FP-growth algorithm to mine frequent itemsets.

New in version 1.4.0.

Methods

train(data[, minSupport, numPartitions])

Computes an FP-Growth model that contains frequent itemsets.

Methods Documentation

classmethod train(data: pyspark.rdd.RDD[List[T]], minSupport: float = 0.3, numPartitions: int = - 1)pyspark.mllib.fpm.FPGrowthModel[source]

Computes an FP-Growth model that contains frequent itemsets.

New in version 1.4.0.

Parameters
datapyspark.RDD

The input data set, each element contains a transaction.

minSupportfloat, optional

The minimal support level. (default: 0.3)

numPartitionsint, optional

The number of partitions used by parallel FP-growth. A value of -1 will use the same number as input data. (default: -1)