public class LinearDataGenerator
extends Object
eps
to the
response variable Y
.Constructor and Description |
---|
LinearDataGenerator() |
Modifier and Type | Method and Description |
---|---|
static scala.collection.Seq<LabeledPoint> |
generateLinearInput(double intercept,
double[] weights,
double[] xMean,
double[] xVariance,
int nPoints,
int seed,
double eps) |
static scala.collection.Seq<LabeledPoint> |
generateLinearInput(double intercept,
double[] weights,
double[] xMean,
double[] xVariance,
int nPoints,
int seed,
double eps,
double sparsity) |
static scala.collection.Seq<LabeledPoint> |
generateLinearInput(double intercept,
double[] weights,
int nPoints,
int seed,
double eps)
For compatibility, the generated data without specifying the mean and variance
will have zero mean and variance of (1.0/3.0) since the original output range is
[-1, 1] with uniform distribution, and the variance of uniform distribution
is (b - a)^2^ / 12 which will be (1.0/3.0)
|
static java.util.List<LabeledPoint> |
generateLinearInputAsList(double intercept,
double[] weights,
int nPoints,
int seed,
double eps)
Return a Java List of synthetic data randomly generated according to a multi
collinear model.
|
static RDD<LabeledPoint> |
generateLinearRDD(SparkContext sc,
int nexamples,
int nfeatures,
double eps,
int nparts,
double intercept)
Generate an RDD containing sample data for Linear Regression models - including Ridge, Lasso,
and unregularized variants.
|
static void |
main(String[] args) |
public static java.util.List<LabeledPoint> generateLinearInputAsList(double intercept, double[] weights, int nPoints, int seed, double eps)
intercept
- Data interceptweights
- Weights to be applied.nPoints
- Number of points in sample.seed
- Random seedeps
- (undocumented)public static scala.collection.Seq<LabeledPoint> generateLinearInput(double intercept, double[] weights, int nPoints, int seed, double eps)
intercept
- Data interceptweights
- Weights to be applied.nPoints
- Number of points in sample.seed
- Random seedeps
- Epsilon scaling factor.public static scala.collection.Seq<LabeledPoint> generateLinearInput(double intercept, double[] weights, double[] xMean, double[] xVariance, int nPoints, int seed, double eps)
intercept
- Data interceptweights
- Weights to be applied.xMean
- the mean of the generated features. Lots of time, if the features are not properly
standardized, the algorithm with poor implementation will have difficulty
to converge.xVariance
- the variance of the generated features.nPoints
- Number of points in sample.seed
- Random seedeps
- Epsilon scaling factor.public static scala.collection.Seq<LabeledPoint> generateLinearInput(double intercept, double[] weights, double[] xMean, double[] xVariance, int nPoints, int seed, double eps, double sparsity)
intercept
- Data interceptweights
- Weights to be applied.xMean
- the mean of the generated features. Lots of time, if the features are not properly
standardized, the algorithm with poor implementation will have difficulty
to converge.xVariance
- the variance of the generated features.nPoints
- Number of points in sample.seed
- Random seedeps
- Epsilon scaling factor.sparsity
- The ratio of zero elements. If it is 0.0, LabeledPoints with
DenseVector is returned.public static RDD<LabeledPoint> generateLinearRDD(SparkContext sc, int nexamples, int nfeatures, double eps, int nparts, double intercept)
sc
- SparkContext to be used for generating the RDD.nexamples
- Number of examples that will be contained in the RDD.nfeatures
- Number of features to generate for each example.eps
- Epsilon factor by which examples are scaled.nparts
- Number of partitions in the RDD. Default value is 2.
intercept
- (undocumented)public static void main(String[] args)