Package org.apache.spark.ml.tree
Interface DecisionTreeParams
- All Superinterfaces:
- HasCheckpointInterval,- HasFeaturesCol,- HasLabelCol,- HasPredictionCol,- HasSeed,- HasWeightCol,- Identifiable,- Params,- PredictorParams,- Serializable
- All Known Subinterfaces:
- DecisionTreeClassifierParams,- DecisionTreeRegressorParams,- GBTClassifierParams,- GBTParams,- GBTRegressorParams,- RandomForestClassifierParams,- RandomForestParams,- RandomForestRegressorParams,- TreeEnsembleClassifierParams,- TreeEnsembleParams,- TreeEnsembleRegressorParams
- All Known Implementing Classes:
- DecisionTreeClassificationModel,- DecisionTreeClassifier,- DecisionTreeRegressionModel,- DecisionTreeRegressor,- GBTClassificationModel,- GBTClassifier,- GBTRegressionModel,- GBTRegressor,- RandomForestClassificationModel,- RandomForestClassifier,- RandomForestRegressionModel,- RandomForestRegressor
public interface DecisionTreeParams
extends PredictorParams, HasCheckpointInterval, HasSeed, HasWeightCol
Parameters for Decision Tree-based algorithms.
 
Note: Marked as private since this may be made public in the future.
- 
Method SummaryModifier and TypeMethodDescriptionIf false, the algorithm will pass trees to executors to match instances with nodes.booleanintintintdoubleintdoublegetOldStrategy(scala.collection.immutable.Map<Object, Object> categoricalFeatures, int numClasses, scala.Enumeration.Value oldAlgo, Impurity oldImpurity, double subsamplingRate) (private[ml]) Create a Strategy instance to use with the old API.leafCol()Leaf indices column name.maxBins()Maximum number of bins used for discretizing continuous features and for choosing how to split on features at each node.maxDepth()Maximum depth of the tree (nonnegative).Maximum memory in MB allocated to histogram aggregation.Minimum information gain for a split to be considered at a tree node.Minimum number of instances each child must have after split.Minimum fraction of the weighted sample count that each child must have after split.setLeafCol(String value) Methods inherited from interface org.apache.spark.ml.param.shared.HasCheckpointIntervalcheckpointInterval, getCheckpointIntervalMethods inherited from interface org.apache.spark.ml.param.shared.HasFeaturesColfeaturesCol, getFeaturesColMethods inherited from interface org.apache.spark.ml.param.shared.HasLabelColgetLabelCol, labelColMethods inherited from interface org.apache.spark.ml.param.shared.HasPredictionColgetPredictionCol, predictionColMethods inherited from interface org.apache.spark.ml.param.shared.HasWeightColgetWeightCol, weightColMethods inherited from interface org.apache.spark.ml.util.IdentifiabletoString, uidMethods inherited from interface org.apache.spark.ml.param.Paramsclear, copy, copyValues, defaultCopy, defaultParamMap, estimateMatadataSize, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwnMethods inherited from interface org.apache.spark.ml.PredictorParamsvalidateAndTransformSchema
- 
Method Details- 
cacheNodeIdsBooleanParam cacheNodeIds()If false, the algorithm will pass trees to executors to match instances with nodes. If true, the algorithm will cache node IDs for each instance. Caching can speed up training of deeper trees. Users can set how often should the cache be checkpointed or disable it by setting checkpointInterval. (default = false)- Returns:
- (undocumented)
 
- 
getCacheNodeIdsboolean getCacheNodeIds()
- 
getLeafColString getLeafCol()
- 
getMaxBinsint getMaxBins()
- 
getMaxDepthint getMaxDepth()
- 
getMaxMemoryInMBint getMaxMemoryInMB()
- 
getMinInfoGaindouble getMinInfoGain()
- 
getMinInstancesPerNodeint getMinInstancesPerNode()
- 
getMinWeightFractionPerNodedouble getMinWeightFractionPerNode()
- 
getOldStrategyStrategy getOldStrategy(scala.collection.immutable.Map<Object, Object> categoricalFeatures, int numClasses, scala.Enumeration.Value oldAlgo, Impurity oldImpurity, double subsamplingRate) (private[ml]) Create a Strategy instance to use with the old API.
- 
leafColLeaf indices column name. Predicted leaf index of each instance in each tree by preorder. (default = "")- Returns:
- (undocumented)
 
- 
maxBinsIntParam maxBins()Maximum number of bins used for discretizing continuous features and for choosing how to split on features at each node. More bins give higher granularity. Must be at least 2 and at least number of categories in any categorical feature. (default = 32)- Returns:
- (undocumented)
 
- 
maxDepthIntParam maxDepth()Maximum depth of the tree (nonnegative). E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes. (default = 5)- Returns:
- (undocumented)
 
- 
maxMemoryInMBIntParam maxMemoryInMB()Maximum memory in MB allocated to histogram aggregation. If too small, then 1 node will be split per iteration, and its aggregates may exceed this size. (default = 256 MB)- Returns:
- (undocumented)
 
- 
minInfoGainDoubleParam minInfoGain()Minimum information gain for a split to be considered at a tree node. Should be at least 0.0. (default = 0.0)- Returns:
- (undocumented)
 
- 
minInstancesPerNodeIntParam minInstancesPerNode()Minimum number of instances each child must have after split. If a split causes the left or right child to have fewer than minInstancesPerNode, the split will be discarded as invalid. Must be at least 1. (default = 1)- Returns:
- (undocumented)
 
- 
minWeightFractionPerNodeDoubleParam minWeightFractionPerNode()Minimum fraction of the weighted sample count that each child must have after split. If a split causes the fraction of the total weight in the left or right child to be less than minWeightFractionPerNode, the split will be discarded as invalid. Should be in the interval [0.0, 0.5). (default = 0.0)- Returns:
- (undocumented)
 
- 
setLeafCol
 
-