Class StandardScaler

Object
org.apache.spark.mllib.feature.StandardScaler
All Implemented Interfaces:
org.apache.spark.internal.Logging

public class StandardScaler extends Object implements org.apache.spark.internal.Logging
Standardizes features by removing the mean and scaling to unit std using column summary statistics on the samples in the training set.

The "unit std" is computed using the corrected sample standard deviation (https://en.wikipedia.org/wiki/Standard_deviation#Corrected_sample_standard_deviation), which is computed as the square root of the unbiased sample variance.

param: withMean False by default. Centers the data with mean before scaling. It will build a dense output, so take care when applying to sparse input. param: withStd True by default. Scales the data to unit standard deviation.

  • Nested Class Summary

    Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging

    org.apache.spark.internal.Logging.SparkShellLoggingFilter
  • Constructor Summary

    Constructors
    Constructor
    Description
     
    StandardScaler(boolean withMean, boolean withStd)
     
  • Method Summary

    Modifier and Type
    Method
    Description
    fit(RDD<Vector> data)
    Computes the mean and variance and stores as a model to be used for later scaling.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface org.apache.spark.internal.Logging

    initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq
  • Constructor Details

    • StandardScaler

      public StandardScaler(boolean withMean, boolean withStd)
    • StandardScaler

      public StandardScaler()
  • Method Details

    • fit

      public StandardScalerModel fit(RDD<Vector> data)
      Computes the mean and variance and stores as a model to be used for later scaling.

      Parameters:
      data - The data used to compute the mean and variance to build the transformation model.
      Returns:
      a StandardScalarModel