org.apache.spark.mllib.feature
Class StandardScaler
Object
org.apache.spark.mllib.feature.StandardScaler
- All Implemented Interfaces:
- Logging
public class StandardScaler
- extends Object
- implements Logging
:: Experimental ::
Standardizes features by removing the mean and scaling to unit std using column summary
statistics on the samples in the training set.
param: withMean False by default. Centers the data with mean before scaling. It will build a
dense output, so this does not work on sparse input and will raise an exception.
param: withStd True by default. Scales the data to unit standard deviation.
Methods inherited from class Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface org.apache.spark.Logging |
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning |
StandardScaler
public StandardScaler(boolean withMean,
boolean withStd)
StandardScaler
public StandardScaler()
fit
public StandardScalerModel fit(RDD<Vector> data)
- Computes the mean and variance and stores as a model to be used for later scaling.
- Parameters:
data
- The data used to compute the mean and variance to build the transformation model.
- Returns:
- a StandardScalarModel