public class Statistics
extends Object
| Constructor and Description | 
|---|
| Statistics() | 
| Modifier and Type | Method and Description | 
|---|---|
| static ChiSqTestResult[] | chiSqTest(JavaRDD<LabeledPoint> data)Java-friendly version of  chiSqTest() | 
| static ChiSqTestResult | chiSqTest(Matrix observed)Conduct Pearson's independence test on the input contingency matrix, which cannot contain
 negative entries or columns or rows that sum up to 0. | 
| static ChiSqTestResult[] | chiSqTest(RDD<LabeledPoint> data)Conduct Pearson's independence test for every feature against the label across the input RDD. | 
| static ChiSqTestResult | chiSqTest(Vector observed)Conduct Pearson's chi-squared goodness of fit test of the observed data against the uniform
 distribution, with each category having an expected frequency of  1 / observed.size. | 
| static ChiSqTestResult | chiSqTest(Vector observed,
         Vector expected)Conduct Pearson's chi-squared goodness of fit test of the observed data against the
 expected distribution. | 
| static MultivariateStatisticalSummary | colStats(RDD<Vector> X)Computes column-wise summary statistics for the input RDD[Vector]. | 
| static double | corr(JavaRDD<Double> x,
    JavaRDD<Double> y)Java-friendly version of  corr() | 
| static double | corr(JavaRDD<Double> x,
    JavaRDD<Double> y,
    String method)Java-friendly version of  corr() | 
| static double | corr(RDD<Object> x,
    RDD<Object> y)Compute the Pearson correlation for the input RDDs. | 
| static double | corr(RDD<Object> x,
    RDD<Object> y,
    String method)Compute the correlation for the input RDDs using the specified method. | 
| static Matrix | corr(RDD<Vector> X)Compute the Pearson correlation matrix for the input RDD of Vectors. | 
| static Matrix | corr(RDD<Vector> X,
    String method)Compute the correlation matrix for the input RDD of Vectors using the specified method. | 
| static KolmogorovSmirnovTestResult | kolmogorovSmirnovTest(JavaDoubleRDD data,
                     String distName,
                     double... params)Java-friendly version of  kolmogorovSmirnovTest() | 
| static KolmogorovSmirnovTestResult | kolmogorovSmirnovTest(JavaDoubleRDD data,
                     String distName,
                     scala.collection.Seq<Object> params)Java-friendly version of  kolmogorovSmirnovTest() | 
| static KolmogorovSmirnovTestResult | kolmogorovSmirnovTest(RDD<Object> data,
                     scala.Function1<Object,Object> cdf)Conduct the two-sided Kolmogorov-Smirnov (KS) test for data sampled from a
 continuous distribution. | 
| static KolmogorovSmirnovTestResult | kolmogorovSmirnovTest(RDD<Object> data,
                     String distName,
                     double... params)Convenience function to conduct a one-sample, two-sided Kolmogorov-Smirnov test for probability
 distribution equality. | 
| static KolmogorovSmirnovTestResult | kolmogorovSmirnovTest(RDD<Object> data,
                     String distName,
                     scala.collection.Seq<Object> params)Convenience function to conduct a one-sample, two-sided Kolmogorov-Smirnov test for probability
 distribution equality. | 
public static KolmogorovSmirnovTestResult kolmogorovSmirnovTest(RDD<Object> data, String distName, double... params)
data - an RDD[Double] containing the sample of data to testdistName - a String name for a theoretical distributionparams - Double* specifying the parameters to be used for the theoretical distributionKolmogorovSmirnovTestResult object containing test
        statistic, p-value, and null hypothesis.public static KolmogorovSmirnovTestResult kolmogorovSmirnovTest(JavaDoubleRDD data, String distName, double... params)
kolmogorovSmirnovTest()data - (undocumented)distName - (undocumented)params - (undocumented)public static MultivariateStatisticalSummary colStats(RDD<Vector> X)
X - an RDD[Vector] for which column-wise summary statistics are to be computed.MultivariateStatisticalSummary object containing column-wise summary statistics.public static Matrix corr(RDD<Vector> X)
X - an RDD[Vector] for which the correlation matrix is to be computed.public static Matrix corr(RDD<Vector> X, String method)
pearson (default), spearman.
 X - an RDD[Vector] for which the correlation matrix is to be computed.method - String specifying the method to use for computing correlation.
               Supported: pearson (default), spearmanmethod = "spearman" to
 avoid recomputing the common lineage.public static double corr(RDD<Object> x, RDD<Object> y)
x - RDD[Double] of the same cardinality as y.y - RDD[Double] of the same cardinality as x.public static double corr(JavaRDD<Double> x, JavaRDD<Double> y)
corr()x - (undocumented)y - (undocumented)public static double corr(RDD<Object> x, RDD<Object> y, String method)
pearson (default), spearman.
 x - RDD[Double] of the same cardinality as y.y - RDD[Double] of the same cardinality as x.method - String specifying the method to use for computing correlation.
               Supported: pearson (default), spearmanpublic static double corr(JavaRDD<Double> x, JavaRDD<Double> y, String method)
corr()x - (undocumented)y - (undocumented)method - (undocumented)public static ChiSqTestResult chiSqTest(Vector observed, Vector expected)
observed - Vector containing the observed categorical counts/relative frequencies.expected - Vector containing the expected categorical counts/relative frequencies.
                 expected is rescaled if the expected sum differs from the observed sum.observed cannot contain negative values.
 expected cannot contain nonpositive values.public static ChiSqTestResult chiSqTest(Vector observed)
1 / observed.size.
 observed - Vector containing the observed categorical counts/relative frequencies.observed cannot contain negative values.public static ChiSqTestResult chiSqTest(Matrix observed)
observed - The contingency matrix (containing either counts or relative frequencies).public static ChiSqTestResult[] chiSqTest(RDD<LabeledPoint> data)
data - an RDD[LabeledPoint] containing the labeled dataset with categorical features.
             Real-valued features will be treated as categorical for each distinct value.public static ChiSqTestResult[] chiSqTest(JavaRDD<LabeledPoint> data)
chiSqTest()data - (undocumented)public static KolmogorovSmirnovTestResult kolmogorovSmirnovTest(RDD<Object> data, scala.Function1<Object,Object> cdf)
data - an RDD[Double] containing the sample of data to testcdf - a Double => Double function to calculate the theoretical CDF at a given valueKolmogorovSmirnovTestResult object containing test
        statistic, p-value, and null hypothesis.public static KolmogorovSmirnovTestResult kolmogorovSmirnovTest(RDD<Object> data, String distName, scala.collection.Seq<Object> params)
data - an RDD[Double] containing the sample of data to testdistName - a String name for a theoretical distributionparams - Double* specifying the parameters to be used for the theoretical distributionKolmogorovSmirnovTestResult object containing test
        statistic, p-value, and null hypothesis.public static KolmogorovSmirnovTestResult kolmogorovSmirnovTest(JavaDoubleRDD data, String distName, scala.collection.Seq<Object> params)
kolmogorovSmirnovTest()data - (undocumented)distName - (undocumented)params - (undocumented)