org.apache.spark.mllib.util
Class KMeansDataGenerator
Object
org.apache.spark.mllib.util.KMeansDataGenerator
public class KMeansDataGenerator
- extends Object
:: DeveloperApi ::
Generate test data for KMeans. This class first chooses k cluster centers
from a d-dimensional Gaussian distribution scaled by factor r and then creates a Gaussian
cluster with scale 1 around each center.
Method Summary |
static RDD<double[]> |
generateKMeansRDD(SparkContext sc,
int numPoints,
int k,
int d,
double r,
int numPartitions)
Generate an RDD containing test data for KMeans. |
static void |
main(String[] args)
|
Methods inherited from class Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
KMeansDataGenerator
public KMeansDataGenerator()
generateKMeansRDD
public static RDD<double[]> generateKMeansRDD(SparkContext sc,
int numPoints,
int k,
int d,
double r,
int numPartitions)
- Generate an RDD containing test data for KMeans.
- Parameters:
sc
- SparkContext to use for creating the RDDnumPoints
- Number of points that will be contained in the RDDk
- Number of clustersd
- Number of dimensionsr
- Scaling factor for the distribution of the initial centersnumPartitions
- Number of partitions of the generated RDD; default 2
- Returns:
- (undocumented)
main
public static void main(String[] args)