Class NaiveBayes
Object
org.apache.spark.ml.PipelineStage
org.apache.spark.ml.Estimator<M>
org.apache.spark.ml.Predictor<FeaturesType,E,M>
org.apache.spark.ml.classification.Classifier<FeaturesType,E,M>
org.apache.spark.ml.classification.ProbabilisticClassifier<Vector,NaiveBayes,NaiveBayesModel>
org.apache.spark.ml.classification.NaiveBayes
- All Implemented Interfaces:
Serializable,org.apache.spark.internal.Logging,ClassifierParams,NaiveBayesParams,ProbabilisticClassifierParams,Params,HasFeaturesCol,HasLabelCol,HasPredictionCol,HasProbabilityCol,HasRawPredictionCol,HasThresholds,HasWeightCol,PredictorParams,DefaultParamsWritable,Identifiable,MLWritable
public class NaiveBayes
extends ProbabilisticClassifier<Vector,NaiveBayes,NaiveBayesModel>
implements NaiveBayesParams, DefaultParamsWritable
Naive Bayes Classifiers.
It supports Multinomial NB
(see
here)
which can handle finitely supported discrete data. For example, by converting documents into
TF-IDF vectors, it can be used for document classification. By making every vector a
binary (0/1) data, it can also be used as Bernoulli NB
(see
here).
The input feature values for Multinomial NB and Bernoulli NB must be nonnegative.
Since 3.0.0, it supports Complement NB which is an adaptation of the Multinomial NB. Specifically,
Complement NB uses statistics from the complement of each class to compute the model's coefficients
The inventors of Complement NB show empirically that the parameter estimates for CNB are more stable
than those for Multinomial NB. Like Multinomial NB, the input feature values for Complement NB must
be nonnegative.
Since 3.0.0, it also supports Gaussian NB
(see
here)
which can handle continuous data.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionCreates a copy of this instance with the same UID and some extra params.static NaiveBayesThe model type which is a string (case-sensitive).static MLReader<T>read()setModelType(String value) Set the model type using a string (case-sensitive).setSmoothing(double value) Set the smoothing parameter.setWeightCol(String value) Sets the value of paramweightCol().final DoubleParamThe smoothing parameter.uid()An immutable unique ID for the object and its derivatives.Param for weight column name.Methods inherited from class org.apache.spark.ml.classification.ProbabilisticClassifier
probabilityCol, setProbabilityCol, setThresholds, thresholdsMethods inherited from class org.apache.spark.ml.classification.Classifier
rawPredictionCol, setRawPredictionColMethods inherited from class org.apache.spark.ml.Predictor
featuresCol, fit, labelCol, predictionCol, setFeaturesCol, setLabelCol, setPredictionCol, transformSchemaMethods inherited from class org.apache.spark.ml.PipelineStage
paramsMethods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.spark.ml.util.DefaultParamsWritable
writeMethods inherited from interface org.apache.spark.ml.param.shared.HasFeaturesCol
featuresCol, getFeaturesColMethods inherited from interface org.apache.spark.ml.param.shared.HasLabelCol
getLabelCol, labelColMethods inherited from interface org.apache.spark.ml.param.shared.HasPredictionCol
getPredictionCol, predictionColMethods inherited from interface org.apache.spark.ml.param.shared.HasProbabilityCol
getProbabilityColMethods inherited from interface org.apache.spark.ml.param.shared.HasRawPredictionCol
getRawPredictionCol, rawPredictionColMethods inherited from interface org.apache.spark.ml.param.shared.HasThresholds
getThresholdsMethods inherited from interface org.apache.spark.ml.param.shared.HasWeightCol
getWeightColMethods inherited from interface org.apache.spark.ml.util.Identifiable
toStringMethods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logBasedOnLevel, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContextMethods inherited from interface org.apache.spark.ml.util.MLWritable
saveMethods inherited from interface org.apache.spark.ml.classification.NaiveBayesParams
getModelType, getSmoothingMethods inherited from interface org.apache.spark.ml.param.Params
clear, copyValues, defaultCopy, defaultParamMap, estimateMatadataSize, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwnMethods inherited from interface org.apache.spark.ml.classification.ProbabilisticClassifierParams
validateAndTransformSchema
-
Constructor Details
-
NaiveBayes
-
NaiveBayes
public NaiveBayes()
-
-
Method Details
-
load
-
read
-
smoothing
Description copied from interface:NaiveBayesParamsThe smoothing parameter. (default = 1.0).- Specified by:
smoothingin interfaceNaiveBayesParams- Returns:
- (undocumented)
-
modelType
Description copied from interface:NaiveBayesParamsThe model type which is a string (case-sensitive). Supported options: "multinomial", "complement", "bernoulli", "gaussian". (default = multinomial)- Specified by:
modelTypein interfaceNaiveBayesParams- Returns:
- (undocumented)
-
weightCol
Description copied from interface:HasWeightColParam for weight column name. If this is not set or empty, we treat all instance weights as 1.0.- Specified by:
weightColin interfaceHasWeightCol- Returns:
- (undocumented)
-
uid
Description copied from interface:IdentifiableAn immutable unique ID for the object and its derivatives.- Specified by:
uidin interfaceIdentifiable- Returns:
- (undocumented)
-
setSmoothing
Set the smoothing parameter. Default is 1.0.- Parameters:
value- (undocumented)- Returns:
- (undocumented)
-
setModelType
Set the model type using a string (case-sensitive). Supported options: "multinomial", "complement", "bernoulli", and "gaussian". Default is "multinomial"- Parameters:
value- (undocumented)- Returns:
- (undocumented)
-
setWeightCol
Sets the value of paramweightCol(). If this is not set or empty, we treat all instance weights as 1.0. Default is not set, so all instances have weight one.- Parameters:
value- (undocumented)- Returns:
- (undocumented)
-
copy
Description copied from interface:ParamsCreates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. SeedefaultCopy().- Specified by:
copyin interfaceParams- Specified by:
copyin classPredictor<Vector,NaiveBayes, NaiveBayesModel> - Parameters:
extra- (undocumented)- Returns:
- (undocumented)
-