Class GaussianMixtureModel

All Implemented Interfaces:
Serializable, org.apache.spark.internal.Logging, GaussianMixtureParams, Params, HasAggregationDepth, HasFeaturesCol, HasMaxIter, HasPredictionCol, HasProbabilityCol, HasSeed, HasTol, HasWeightCol, HasTrainingSummary<GaussianMixtureSummary>, Identifiable, MLWritable, scala.Serializable

Multivariate Gaussian Mixture Model (GMM) consisting of k Gaussians, where points are drawn from each Gaussian i with probability weights(i).

param: weights Weight for each Gaussian distribution in the mixture. This is a multinomial probability distribution over the k Gaussians, where weights(i) is the weight for Gaussian i, and weights sum to 1. param: gaussians Array of MultivariateGaussian where gaussians(i) represents the Multivariate Gaussian (Normal) Distribution for Gaussian i

See Also:
  • Method Details

    • read

      public static MLReader<GaussianMixtureModel> read()
    • load

      public static GaussianMixtureModel load(String path)
    • k

      public final IntParam k()
      Description copied from interface: GaussianMixtureParams
      Number of independent Gaussians in the mixture model. Must be greater than 1. Default: 2.

      Specified by:
      k in interface GaussianMixtureParams
      Returns:
      (undocumented)
    • aggregationDepth

      public final IntParam aggregationDepth()
      Description copied from interface: HasAggregationDepth
      Param for suggested depth for treeAggregate (&gt;= 2).
      Specified by:
      aggregationDepth in interface HasAggregationDepth
      Returns:
      (undocumented)
    • tol

      public final DoubleParam tol()
      Description copied from interface: HasTol
      Param for the convergence tolerance for iterative algorithms (&gt;= 0).
      Specified by:
      tol in interface HasTol
      Returns:
      (undocumented)
    • probabilityCol

      public final Param<String> probabilityCol()
      Description copied from interface: HasProbabilityCol
      Param for Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities.
      Specified by:
      probabilityCol in interface HasProbabilityCol
      Returns:
      (undocumented)
    • weightCol

      public final Param<String> weightCol()
      Description copied from interface: HasWeightCol
      Param for weight column name. If this is not set or empty, we treat all instance weights as 1.0.
      Specified by:
      weightCol in interface HasWeightCol
      Returns:
      (undocumented)
    • predictionCol

      public final Param<String> predictionCol()
      Description copied from interface: HasPredictionCol
      Param for prediction column name.
      Specified by:
      predictionCol in interface HasPredictionCol
      Returns:
      (undocumented)
    • seed

      public final LongParam seed()
      Description copied from interface: HasSeed
      Param for random seed.
      Specified by:
      seed in interface HasSeed
      Returns:
      (undocumented)
    • featuresCol

      public final Param<String> featuresCol()
      Description copied from interface: HasFeaturesCol
      Param for features column name.
      Specified by:
      featuresCol in interface HasFeaturesCol
      Returns:
      (undocumented)
    • maxIter

      public final IntParam maxIter()
      Description copied from interface: HasMaxIter
      Param for maximum number of iterations (&gt;= 0).
      Specified by:
      maxIter in interface HasMaxIter
      Returns:
      (undocumented)
    • uid

      public String uid()
      Description copied from interface: Identifiable
      An immutable unique ID for the object and its derivatives.
      Specified by:
      uid in interface Identifiable
      Returns:
      (undocumented)
    • weights

      public double[] weights()
    • gaussians

      public MultivariateGaussian[] gaussians()
    • numFeatures

      public int numFeatures()
    • setFeaturesCol

      public GaussianMixtureModel setFeaturesCol(String value)
    • setPredictionCol

      public GaussianMixtureModel setPredictionCol(String value)
    • setProbabilityCol

      public GaussianMixtureModel setProbabilityCol(String value)
    • copy

      public GaussianMixtureModel copy(ParamMap extra)
      Description copied from interface: Params
      Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See defaultCopy().
      Specified by:
      copy in interface Params
      Specified by:
      copy in class Model<GaussianMixtureModel>
      Parameters:
      extra - (undocumented)
      Returns:
      (undocumented)
    • transform

      public Dataset<Row> transform(Dataset<?> dataset)
      Description copied from class: Transformer
      Transforms the input dataset.
      Specified by:
      transform in class Transformer
      Parameters:
      dataset - (undocumented)
      Returns:
      (undocumented)
    • transformSchema

      public StructType transformSchema(StructType schema)
      Description copied from class: PipelineStage
      Check transform validity and derive the output schema from the input schema.

      We check validity for interactions between parameters during transformSchema and raise an exception if any parameter value is invalid. Parameter value checks which do not depend on other parameters are handled by Param.validate().

      Typical implementation should first conduct verification on schema change and parameter validity, including complex parameter interaction checks.

      Specified by:
      transformSchema in class PipelineStage
      Parameters:
      schema - (undocumented)
      Returns:
      (undocumented)
    • predict

      public int predict(Vector features)
    • predictProbability

      public Vector predictProbability(Vector features)
    • gaussiansDF

      public Dataset<Row> gaussiansDF()
      Retrieve Gaussian distributions as a DataFrame. Each row represents a Gaussian Distribution. Two columns are defined: mean and cov. Schema:
      
        root
         |-- mean: vector (nullable = true)
         |-- cov: matrix (nullable = true)
       
      Returns:
      (undocumented)
    • write

      public MLWriter write()
      Returns a MLWriter instance for this ML instance.

      For GaussianMixtureModel, this does NOT currently save the training summary(). An option to save summary() may be added in the future.

      Specified by:
      write in interface MLWritable
      Returns:
      (undocumented)
    • toString

      public String toString()
      Specified by:
      toString in interface Identifiable
      Overrides:
      toString in class Object
    • summary

      public GaussianMixtureSummary summary()
      Gets summary of model on training set. An exception is thrown if hasSummary is false.
      Specified by:
      summary in interface HasTrainingSummary<GaussianMixtureSummary>
      Returns:
      (undocumented)