Class NaiveBayes

All Implemented Interfaces:
Serializable, org.apache.spark.internal.Logging, ClassifierParams, NaiveBayesParams, ProbabilisticClassifierParams, Params, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasProbabilityCol, HasRawPredictionCol, HasThresholds, HasWeightCol, PredictorParams, DefaultParamsWritable, Identifiable, MLWritable, scala.Serializable

Naive Bayes Classifiers. It supports Multinomial NB (see here) which can handle finitely supported discrete data. For example, by converting documents into TF-IDF vectors, it can be used for document classification. By making every vector a binary (0/1) data, it can also be used as Bernoulli NB (see here). The input feature values for Multinomial NB and Bernoulli NB must be nonnegative. Since 3.0.0, it supports Complement NB which is an adaptation of the Multinomial NB. Specifically, Complement NB uses statistics from the complement of each class to compute the model's coefficients The inventors of Complement NB show empirically that the parameter estimates for CNB are more stable than those for Multinomial NB. Like Multinomial NB, the input feature values for Complement NB must be nonnegative. Since 3.0.0, it also supports Gaussian NB (see here) which can handle continuous data.
See Also:
  • Constructor Details

    • NaiveBayes

      public NaiveBayes(String uid)
    • NaiveBayes

      public NaiveBayes()
  • Method Details

    • load

      public static NaiveBayes load(String path)
    • read

      public static MLReader<T> read()
    • smoothing

      public final DoubleParam smoothing()
      Description copied from interface: NaiveBayesParams
      The smoothing parameter. (default = 1.0).
      Specified by:
      smoothing in interface NaiveBayesParams
      Returns:
      (undocumented)
    • modelType

      public final Param<String> modelType()
      Description copied from interface: NaiveBayesParams
      The model type which is a string (case-sensitive). Supported options: "multinomial", "complement", "bernoulli", "gaussian". (default = multinomial)
      Specified by:
      modelType in interface NaiveBayesParams
      Returns:
      (undocumented)
    • weightCol

      public final Param<String> weightCol()
      Description copied from interface: HasWeightCol
      Param for weight column name. If this is not set or empty, we treat all instance weights as 1.0.
      Specified by:
      weightCol in interface HasWeightCol
      Returns:
      (undocumented)
    • uid

      public String uid()
      Description copied from interface: Identifiable
      An immutable unique ID for the object and its derivatives.
      Specified by:
      uid in interface Identifiable
      Returns:
      (undocumented)
    • setSmoothing

      public NaiveBayes setSmoothing(double value)
      Set the smoothing parameter. Default is 1.0.
      Parameters:
      value - (undocumented)
      Returns:
      (undocumented)
    • setModelType

      public NaiveBayes setModelType(String value)
      Set the model type using a string (case-sensitive). Supported options: "multinomial", "complement", "bernoulli", and "gaussian". Default is "multinomial"
      Parameters:
      value - (undocumented)
      Returns:
      (undocumented)
    • setWeightCol

      public NaiveBayes setWeightCol(String value)
      Sets the value of param weightCol(). If this is not set or empty, we treat all instance weights as 1.0. Default is not set, so all instances have weight one.

      Parameters:
      value - (undocumented)
      Returns:
      (undocumented)
    • copy

      public NaiveBayes copy(ParamMap extra)
      Description copied from interface: Params
      Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See defaultCopy().
      Specified by:
      copy in interface Params
      Specified by:
      copy in class Predictor<Vector,NaiveBayes,NaiveBayesModel>
      Parameters:
      extra - (undocumented)
      Returns:
      (undocumented)