public class BaggedPoint<Datum>
extends Object
implements scala.Serializable
This holds one instance, as well as an array of weights which represent the (weighted) number of times which this instance appears in each subsamplingRate. E.g., (datum, [1, 0, 4]) indicates that there are 3 subsamples of the dataset and that this datum has 1 copy, 0 copies, and 4 copies in the 3 subsamples, respectively.
Constructor and Description |
---|
BaggedPoint(Datum datum,
double[] subsampleWeights) |
Modifier and Type | Method and Description |
---|---|
static <Datum> RDD<BaggedPoint<Datum>> |
convertToBaggedRDD(RDD<Datum> input,
double subsamplingRate,
int numSubsamples,
boolean withReplacement,
int seed)
Convert an input dataset into its BaggedPoint representation,
choosing subsamplingRate counts for each instance.
|
Datum |
datum() |
double[] |
subsampleWeights() |
public BaggedPoint(Datum datum, double[] subsampleWeights)
public static <Datum> RDD<BaggedPoint<Datum>> convertToBaggedRDD(RDD<Datum> input, double subsamplingRate, int numSubsamples, boolean withReplacement, int seed)
input
- Input dataset.subsamplingRate
- Fraction of the training data used for learning decision tree.numSubsamples
- Number of subsamples of this RDD to take.withReplacement
- Sampling with/without replacement.seed
- Random seed.public Datum datum()
public double[] subsampleWeights()