Class BinaryClassificationMetrics
- All Implemented Interfaces:
org.apache.spark.internal.Logging
param: scoreAndLabels an RDD of (score, label) or (score, label, weight) tuples.
param: numBins if greater than 0, then the curves (ROC curve, PR curve) computed internally
will be down-sampled to this many "bins". If 0, no down-sampling will occur.
This is useful because the curve contains a point for each distinct score
in the input, and this could be as large as the input itself -- millions of
points or more, when thousands may be entirely sufficient to summarize
the curve. After down-sampling, the curves will instead be made of approximately
numBins
points instead. Points are made from bins of equal numbers of
consecutive points. The size of each bin is
floor(scoreAndLabels.count() / numBins)
, which means the resulting number
of bins may not exactly equal numBins. The last bin in each partition may
be smaller as a result, meaning there may be an extra sample at
partition boundaries.
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
-
Constructor Summary
ConstructorDescriptionBinaryClassificationMetrics
(RDD<? extends scala.Product> scoreAndLabels, int numBins) BinaryClassificationMetrics
(RDD<scala.Tuple2<Object, Object>> scoreAndLabels) DefaultsnumBins
to 0. -
Method Summary
Modifier and TypeMethodDescriptiondouble
Computes the area under the precision-recall curve.double
Computes the area under the receiver operating characteristic (ROC) curve.Returns the (threshold, F-Measure) curve with beta = 1.0.fMeasureByThreshold
(double beta) Returns the (threshold, F-Measure) curve.int
numBins()
pr()
Returns the precision-recall curve, which is an RDD of (recall, precision), NOT (precision, recall), with (0.0, p) prepended to it, where p is the precision associated with the lowest recall on the curve.Returns the (threshold, precision) curve.Returns the (threshold, recall) curve.roc()
Returns the receiver operating characteristic (ROC) curve, which is an RDD of (false positive rate, true positive rate) with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.RDD<? extends scala.Product>
Returns thresholds in descending order.void
Unpersist intermediate RDDs used in the computation.Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
-
Constructor Details
-
BinaryClassificationMetrics
-
BinaryClassificationMetrics
DefaultsnumBins
to 0.- Parameters:
scoreAndLabels
- (undocumented)
-
-
Method Details
-
scoreAndLabels
-
numBins
public int numBins() -
unpersist
public void unpersist()Unpersist intermediate RDDs used in the computation. -
thresholds
Returns thresholds in descending order.- Returns:
- (undocumented)
-
roc
Returns the receiver operating characteristic (ROC) curve, which is an RDD of (false positive rate, true positive rate) with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.- Returns:
- (undocumented)
- See Also:
-
areaUnderROC
public double areaUnderROC()Computes the area under the receiver operating characteristic (ROC) curve.- Returns:
- (undocumented)
-
pr
Returns the precision-recall curve, which is an RDD of (recall, precision), NOT (precision, recall), with (0.0, p) prepended to it, where p is the precision associated with the lowest recall on the curve.- Returns:
- (undocumented)
- See Also:
-
areaUnderPR
public double areaUnderPR()Computes the area under the precision-recall curve.- Returns:
- (undocumented)
-
fMeasureByThreshold
Returns the (threshold, F-Measure) curve.- Parameters:
beta
- the beta factor in F-Measure computation.- Returns:
- an RDD of (threshold, F-Measure) pairs.
- See Also:
-
fMeasureByThreshold
Returns the (threshold, F-Measure) curve with beta = 1.0.- Returns:
- (undocumented)
-
precisionByThreshold
Returns the (threshold, precision) curve.- Returns:
- (undocumented)
-
recallByThreshold
Returns the (threshold, recall) curve.- Returns:
- (undocumented)
-