Class PearsonCorrelation

Object
org.apache.spark.mllib.stat.correlation.PearsonCorrelation

public class PearsonCorrelation extends Object
Compute Pearson correlation for two RDDs of the type RDD[Double] or the correlation matrix for an RDD of the type RDD[Vector].

Definition of Pearson correlation can be found at http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient

  • Constructor Details

    • PearsonCorrelation

      public PearsonCorrelation()
  • Method Details

    • computeCorrelation

      public static double computeCorrelation(RDD<Object> x, RDD<Object> y)
      Compute the Pearson correlation for two datasets. NaN if either vector has 0 variance.
      Parameters:
      x - (undocumented)
      y - (undocumented)
      Returns:
      (undocumented)
    • computeCorrelationMatrix

      public static Matrix computeCorrelationMatrix(RDD<Vector> X)
      Compute the Pearson correlation matrix S, for the input matrix, where S(i, j) is the correlation between column i and j. 0 covariance results in a correlation value of Double.NaN.
      Parameters:
      X - (undocumented)
      Returns:
      (undocumented)
    • computeCorrelationMatrixFromCovariance

      public static Matrix computeCorrelationMatrixFromCovariance(Matrix covarianceMatrix)
      Compute the Pearson correlation matrix from the covariance matrix. 0 variance results in a correlation value of Double.NaN.
      Parameters:
      covarianceMatrix - (undocumented)
      Returns:
      (undocumented)
    • computeCorrelationWithMatrixImpl

      public static double computeCorrelationWithMatrixImpl(RDD<Object> x, RDD<Object> y)
    • org$apache$spark$internal$Logging$$log_

      public static org.slf4j.Logger org$apache$spark$internal$Logging$$log_()
    • org$apache$spark$internal$Logging$$log__$eq

      public static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1)