public class MLUtils
extends Object
| Constructor and Description | 
|---|
| MLUtils() | 
| Modifier and Type | Method and Description | 
|---|---|
| static Vector | appendBias(Vector vector)Returns a new vector with  1.0(bias) appended to the input vector. | 
| static Dataset<Row> | convertMatrixColumnsFromML(Dataset<?> dataset,
                          scala.collection.Seq<String> cols) | 
| static Dataset<Row> | convertMatrixColumnsFromML(Dataset<?> dataset,
                          String... cols) | 
| static Dataset<Row> | convertMatrixColumnsToML(Dataset<?> dataset,
                        scala.collection.Seq<String> cols) | 
| static Dataset<Row> | convertMatrixColumnsToML(Dataset<?> dataset,
                        String... cols) | 
| static Dataset<Row> | convertVectorColumnsFromML(Dataset<?> dataset,
                          scala.collection.Seq<String> cols) | 
| static Dataset<Row> | convertVectorColumnsFromML(Dataset<?> dataset,
                          String... cols) | 
| static Dataset<Row> | convertVectorColumnsToML(Dataset<?> dataset,
                        scala.collection.Seq<String> cols) | 
| static Dataset<Row> | convertVectorColumnsToML(Dataset<?> dataset,
                        String... cols) | 
| static scala.Tuple2<RDD<Row>,RDD<Row>>[] | kFold(Dataset<Row> df,
     int numFolds,
     String foldColName)Version of  kFold()taking a fold column name. | 
| static <T> scala.Tuple2<RDD<T>,RDD<T>>[] | kFold(RDD<T> rdd,
     int numFolds,
     int seed,
     scala.reflect.ClassTag<T> evidence$1)Return a k element array of pairs of RDDs with the first element of each pair
 containing the training data, a complement of the validation data and the second
 element, the validation data, containing a unique 1/kth of the data. | 
| static <T> scala.Tuple2<RDD<T>,RDD<T>>[] | kFold(RDD<T> rdd,
     int numFolds,
     long seed,
     scala.reflect.ClassTag<T> evidence$2)Version of  kFold()taking a Long seed. | 
| static RDD<LabeledPoint> | loadLabeledPoints(SparkContext sc,
                 String dir)Loads labeled points saved using  RDD[LabeledPoint].saveAsTextFilewith the default number of
 partitions. | 
| static RDD<LabeledPoint> | loadLabeledPoints(SparkContext sc,
                 String path,
                 int minPartitions)Loads labeled points saved using  RDD[LabeledPoint].saveAsTextFile. | 
| static RDD<LabeledPoint> | loadLibSVMFile(SparkContext sc,
              String path)Loads binary labeled data in the LIBSVM format into an RDD[LabeledPoint], with number of
 features determined automatically and the default number of partitions. | 
| static RDD<LabeledPoint> | loadLibSVMFile(SparkContext sc,
              String path,
              int numFeatures)Loads labeled data in the LIBSVM format into an RDD[LabeledPoint], with the default number of
 partitions. | 
| static RDD<LabeledPoint> | loadLibSVMFile(SparkContext sc,
              String path,
              int numFeatures,
              int minPartitions)Loads labeled data in the LIBSVM format into an RDD[LabeledPoint]. | 
| static RDD<Vector> | loadVectors(SparkContext sc,
           String path)Loads vectors saved using  RDD[Vector].saveAsTextFilewith the default number of partitions. | 
| static RDD<Vector> | loadVectors(SparkContext sc,
           String path,
           int minPartitions)Loads vectors saved using  RDD[Vector].saveAsTextFile. | 
| static void | org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1) | 
| static org.slf4j.Logger | org$apache$spark$internal$Logging$$log_() | 
| static void | saveAsLibSVMFile(RDD<LabeledPoint> data,
                String dir)Save labeled data in LIBSVM format. | 
public static Dataset<Row> convertVectorColumnsToML(Dataset<?> dataset, String... cols)
Vector
 type to the new Vector type under the spark.ml package.dataset - input datasetcols - a list of vector columns to be converted. New vector columns will be ignored. If
             unspecified, all old vector columns will be converted except nested ones.DataFrame with old vector columns converted to the new vector typepublic static Dataset<Row> convertVectorColumnsFromML(Dataset<?> dataset, String... cols)
Vector
 type from the new Vector type under the spark.ml package.dataset - input datasetcols - a list of vector columns to be converted. Old vector columns will be ignored. If
             unspecified, all new vector columns will be converted except nested ones.DataFrame with new vector columns converted to the old vector typepublic static Dataset<Row> convertMatrixColumnsToML(Dataset<?> dataset, String... cols)
Matrix
 type to the new Matrix type under the spark.ml package.dataset - input datasetcols - a list of matrix columns to be converted. New matrix columns will be ignored. If
             unspecified, all old matrix columns will be converted except nested ones.DataFrame with old matrix columns converted to the new matrix typepublic static Dataset<Row> convertMatrixColumnsFromML(Dataset<?> dataset, String... cols)
Matrix
 type from the new Matrix type under the spark.ml package.dataset - input datasetcols - a list of matrix columns to be converted. Old matrix columns will be ignored. If
             unspecified, all new matrix columns will be converted except nested ones.DataFrame with new matrix columns converted to the old matrix typepublic static RDD<LabeledPoint> loadLibSVMFile(SparkContext sc, String path, int numFeatures, int minPartitions)
label index1:value1 index2:value2 ...sc - (undocumented)path - (undocumented)numFeatures - (undocumented)minPartitions - (undocumented)public static RDD<LabeledPoint> loadLibSVMFile(SparkContext sc, String path, int numFeatures)
sc - (undocumented)path - (undocumented)numFeatures - (undocumented)public static RDD<LabeledPoint> loadLibSVMFile(SparkContext sc, String path)
sc - (undocumented)path - (undocumented)public static void saveAsLibSVMFile(RDD<LabeledPoint> data, String dir)
data - an RDD of LabeledPoint to be saveddir - directory to save the dataorg.apache.spark.mllib.util.MLUtils.loadLibSVMFilepublic static RDD<Vector> loadVectors(SparkContext sc, String path, int minPartitions)
RDD[Vector].saveAsTextFile.sc - Spark contextpath - file or directory path in any Hadoop-supported file system URIminPartitions - min number of partitionspublic static RDD<Vector> loadVectors(SparkContext sc, String path)
RDD[Vector].saveAsTextFile with the default number of partitions.sc - (undocumented)path - (undocumented)public static RDD<LabeledPoint> loadLabeledPoints(SparkContext sc, String path, int minPartitions)
RDD[LabeledPoint].saveAsTextFile.sc - Spark contextpath - file or directory path in any Hadoop-supported file system URIminPartitions - min number of partitionspublic static RDD<LabeledPoint> loadLabeledPoints(SparkContext sc, String dir)
RDD[LabeledPoint].saveAsTextFile with the default number of
 partitions.sc - (undocumented)dir - (undocumented)public static <T> scala.Tuple2<RDD<T>,RDD<T>>[] kFold(RDD<T> rdd, int numFolds, int seed, scala.reflect.ClassTag<T> evidence$1)
rdd - (undocumented)numFolds - (undocumented)seed - (undocumented)evidence$1 - (undocumented)public static <T> scala.Tuple2<RDD<T>,RDD<T>>[] kFold(RDD<T> rdd, int numFolds, long seed, scala.reflect.ClassTag<T> evidence$2)
kFold() taking a Long seed.rdd - (undocumented)numFolds - (undocumented)seed - (undocumented)evidence$2 - (undocumented)public static scala.Tuple2<RDD<Row>,RDD<Row>>[] kFold(Dataset<Row> df, int numFolds, String foldColName)
kFold() taking a fold column name.df - (undocumented)numFolds - (undocumented)foldColName - (undocumented)public static Vector appendBias(Vector vector)
1.0 (bias) appended to the input vector.vector - (undocumented)public static Dataset<Row> convertVectorColumnsToML(Dataset<?> dataset, scala.collection.Seq<String> cols)
Vector
 type to the new Vector type under the spark.ml package.dataset - input datasetcols - a list of vector columns to be converted. New vector columns will be ignored. If
             unspecified, all old vector columns will be converted except nested ones.DataFrame with old vector columns converted to the new vector typepublic static Dataset<Row> convertVectorColumnsFromML(Dataset<?> dataset, scala.collection.Seq<String> cols)
Vector
 type from the new Vector type under the spark.ml package.dataset - input datasetcols - a list of vector columns to be converted. Old vector columns will be ignored. If
             unspecified, all new vector columns will be converted except nested ones.DataFrame with new vector columns converted to the old vector typepublic static Dataset<Row> convertMatrixColumnsToML(Dataset<?> dataset, scala.collection.Seq<String> cols)
Matrix
 type to the new Matrix type under the spark.ml package.dataset - input datasetcols - a list of matrix columns to be converted. New matrix columns will be ignored. If
             unspecified, all old matrix columns will be converted except nested ones.DataFrame with old matrix columns converted to the new matrix typepublic static Dataset<Row> convertMatrixColumnsFromML(Dataset<?> dataset, scala.collection.Seq<String> cols)
Matrix
 type from the new Matrix type under the spark.ml package.dataset - input datasetcols - a list of matrix columns to be converted. Old matrix columns will be ignored. If
             unspecified, all new matrix columns will be converted except nested ones.DataFrame with new matrix columns converted to the old matrix typepublic static org.slf4j.Logger org$apache$spark$internal$Logging$$log_()
public static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1)