Class FPGrowth

All Implemented Interfaces:
Serializable, org.apache.spark.internal.Logging, FPGrowthParams, Params, HasPredictionCol, DefaultParamsWritable, Identifiable, MLWritable, scala.Serializable

public class FPGrowth extends Estimator<FPGrowthModel> implements FPGrowthParams, DefaultParamsWritable
A parallel FP-growth algorithm to mine frequent itemsets. The algorithm is described in Li et al., PFP: Parallel FP-Growth for Query Recommendation. PFP distributes computation in such a way that each worker executes an independent group of mining tasks. The FP-Growth algorithm is described in Han et al., Mining frequent patterns without candidate generation. Note null values in the itemsCol column are ignored during fit().

See Also:
  • Constructor Details

    • FPGrowth

      public FPGrowth(String uid)
    • FPGrowth

      public FPGrowth()
  • Method Details

    • load

      public static FPGrowth load(String path)
    • read

      public static MLReader<T> read()
    • itemsCol

      public Param<String> itemsCol()
      Description copied from interface: FPGrowthParams
      Items column name. Default: "items"
      Specified by:
      itemsCol in interface FPGrowthParams
      Returns:
      (undocumented)
    • minSupport

      public DoubleParam minSupport()
      Description copied from interface: FPGrowthParams
      Minimal support level of the frequent pattern. [0.0, 1.0]. Any pattern that appears more than (minSupport * size-of-the-dataset) times will be output in the frequent itemsets. Default: 0.3
      Specified by:
      minSupport in interface FPGrowthParams
      Returns:
      (undocumented)
    • numPartitions

      public IntParam numPartitions()
      Description copied from interface: FPGrowthParams
      Number of partitions (at least 1) used by parallel FP-growth. By default the param is not set, and partition number of the input dataset is used.
      Specified by:
      numPartitions in interface FPGrowthParams
      Returns:
      (undocumented)
    • minConfidence

      public DoubleParam minConfidence()
      Description copied from interface: FPGrowthParams
      Minimal confidence for generating Association Rule. minConfidence will not affect the mining for frequent itemsets, but will affect the association rules generation. Default: 0.8
      Specified by:
      minConfidence in interface FPGrowthParams
      Returns:
      (undocumented)
    • predictionCol

      public final Param<String> predictionCol()
      Description copied from interface: HasPredictionCol
      Param for prediction column name.
      Specified by:
      predictionCol in interface HasPredictionCol
      Returns:
      (undocumented)
    • uid

      public String uid()
      Description copied from interface: Identifiable
      An immutable unique ID for the object and its derivatives.
      Specified by:
      uid in interface Identifiable
      Returns:
      (undocumented)
    • setMinSupport

      public FPGrowth setMinSupport(double value)
    • setNumPartitions

      public FPGrowth setNumPartitions(int value)
    • setMinConfidence

      public FPGrowth setMinConfidence(double value)
    • setItemsCol

      public FPGrowth setItemsCol(String value)
    • setPredictionCol

      public FPGrowth setPredictionCol(String value)
    • fit

      public FPGrowthModel fit(Dataset<?> dataset)
      Description copied from class: Estimator
      Fits a model to the input data.
      Specified by:
      fit in class Estimator<FPGrowthModel>
      Parameters:
      dataset - (undocumented)
      Returns:
      (undocumented)
    • transformSchema

      public StructType transformSchema(StructType schema)
      Description copied from class: PipelineStage
      Check transform validity and derive the output schema from the input schema.

      We check validity for interactions between parameters during transformSchema and raise an exception if any parameter value is invalid. Parameter value checks which do not depend on other parameters are handled by Param.validate().

      Typical implementation should first conduct verification on schema change and parameter validity, including complex parameter interaction checks.

      Specified by:
      transformSchema in class PipelineStage
      Parameters:
      schema - (undocumented)
      Returns:
      (undocumented)
    • copy

      public FPGrowth copy(ParamMap extra)
      Description copied from interface: Params
      Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See defaultCopy().
      Specified by:
      copy in interface Params
      Specified by:
      copy in class Estimator<FPGrowthModel>
      Parameters:
      extra - (undocumented)
      Returns:
      (undocumented)