|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
Object org.apache.spark.ml.PipelineStage org.apache.spark.ml.Transformer org.apache.spark.ml.Model<VectorIndexerModel> org.apache.spark.ml.feature.VectorIndexerModel
public class VectorIndexerModel
:: Experimental :: Transform categorical features to use 0-based indices instead of their original values. - Categorical features are mapped to indices. - Continuous features (columns) are left unchanged. This also appends metadata to the output column, marking features as Numeric (continuous), Nominal (categorical), or Binary (either continuous or categorical). Non-ML metadata is not carried over from the input to the output column.
This maintains vector sparsity.
param: numFeatures Number of features, i.e., length of Vectors which this transforms param: categoryMaps Feature value index. Keys are categorical feature indices (column indices). Values are maps from original features values to 0-based category indices. If a feature is not in this map, it is treated as continuous.
Method Summary | |
---|---|
scala.collection.immutable.Map<Object,scala.collection.immutable.Map<Object,Object>> |
categoryMaps()
|
VectorIndexerModel |
copy(ParamMap extra)
Creates a copy of this instance with the same UID and some extra params. |
int |
getMaxCategories()
|
java.util.Map<Integer,java.util.Map<Double,Integer>> |
javaCategoryMaps()
Java-friendly version of categoryMaps |
IntParam |
maxCategories()
Threshold for the number of values a categorical feature can take. |
int |
numFeatures()
|
VectorIndexerModel |
setInputCol(String value)
|
VectorIndexerModel |
setOutputCol(String value)
|
DataFrame |
transform(DataFrame dataset)
Transforms the input dataset. |
StructType |
transformSchema(StructType schema)
:: DeveloperApi :: |
String |
uid()
|
Methods inherited from class org.apache.spark.ml.Model |
---|
hasParent, parent, setParent |
Methods inherited from class org.apache.spark.ml.Transformer |
---|
transform, transform, transform |
Methods inherited from class Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface org.apache.spark.ml.param.Params |
---|
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, setDefault, shouldOwn, validateParams |
Methods inherited from interface org.apache.spark.Logging |
---|
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning |
Method Detail |
---|
public String uid()
public int numFeatures()
public scala.collection.immutable.Map<Object,scala.collection.immutable.Map<Object,Object>> categoryMaps()
public java.util.Map<Integer,java.util.Map<Double,Integer>> javaCategoryMaps()
categoryMaps
public VectorIndexerModel setInputCol(String value)
public VectorIndexerModel setOutputCol(String value)
public DataFrame transform(DataFrame dataset)
Transformer
transform
in class Transformer
dataset
- (undocumented)
public StructType transformSchema(StructType schema)
PipelineStage
Derives the output schema from the input schema.
transformSchema
in class PipelineStage
schema
- (undocumented)
public VectorIndexerModel copy(ParamMap extra)
Params
copy
in interface Params
copy
in class Model<VectorIndexerModel>
extra
- (undocumented)
defaultCopy()
public IntParam maxCategories()
(default = 20)
public int getMaxCategories()
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |