Class BinaryClassificationMetrics

Object
org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
All Implemented Interfaces:
org.apache.spark.internal.Logging

public class BinaryClassificationMetrics extends Object implements org.apache.spark.internal.Logging
Evaluator for binary classification.

param: scoreAndLabels an RDD of (score, label) or (score, label, weight) tuples. param: numBins if greater than 0, then the curves (ROC curve, PR curve) computed internally will be down-sampled to this many "bins". If 0, no down-sampling will occur. This is useful because the curve contains a point for each distinct score in the input, and this could be as large as the input itself -- millions of points or more, when thousands may be entirely sufficient to summarize the curve. After down-sampling, the curves will instead be made of approximately numBins points instead. Points are made from bins of equal numbers of consecutive points. The size of each bin is floor(scoreAndLabels.count() / numBins), which means the resulting number of bins may not exactly equal numBins. The last bin in each partition may be smaller as a result, meaning there may be an extra sample at partition boundaries.

  • Nested Class Summary

    Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging

    org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
  • Constructor Summary

    Constructors
    Constructor
    Description
    BinaryClassificationMetrics(RDD<? extends scala.Product> scoreAndLabels, int numBins)
     
    BinaryClassificationMetrics(RDD<scala.Tuple2<Object,Object>> scoreAndLabels)
    Defaults numBins to 0.
  • Method Summary

    Modifier and Type
    Method
    Description
    double
    Computes the area under the precision-recall curve.
    double
    Computes the area under the receiver operating characteristic (ROC) curve.
    RDD<scala.Tuple2<Object,Object>>
    Returns the (threshold, F-Measure) curve with beta = 1.0.
    RDD<scala.Tuple2<Object,Object>>
    fMeasureByThreshold(double beta)
    Returns the (threshold, F-Measure) curve.
    int
     
    RDD<scala.Tuple2<Object,Object>>
    pr()
    Returns the precision-recall curve, which is an RDD of (recall, precision), NOT (precision, recall), with (0.0, p) prepended to it, where p is the precision associated with the lowest recall on the curve.
    RDD<scala.Tuple2<Object,Object>>
    Returns the (threshold, precision) curve.
    RDD<scala.Tuple2<Object,Object>>
    Returns the (threshold, recall) curve.
    RDD<scala.Tuple2<Object,Object>>
    roc()
    Returns the receiver operating characteristic (ROC) curve, which is an RDD of (false positive rate, true positive rate) with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.
    RDD<? extends scala.Product>
     
    Returns thresholds in descending order.
    void
    Unpersist intermediate RDDs used in the computation.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface org.apache.spark.internal.Logging

    initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
  • Constructor Details

    • BinaryClassificationMetrics

      public BinaryClassificationMetrics(RDD<? extends scala.Product> scoreAndLabels, int numBins)
    • BinaryClassificationMetrics

      public BinaryClassificationMetrics(RDD<scala.Tuple2<Object,Object>> scoreAndLabels)
      Defaults numBins to 0.
      Parameters:
      scoreAndLabels - (undocumented)
  • Method Details

    • scoreAndLabels

      public RDD<? extends scala.Product> scoreAndLabels()
    • numBins

      public int numBins()
    • unpersist

      public void unpersist()
      Unpersist intermediate RDDs used in the computation.
    • thresholds

      public RDD<Object> thresholds()
      Returns thresholds in descending order.
      Returns:
      (undocumented)
    • roc

      public RDD<scala.Tuple2<Object,Object>> roc()
      Returns the receiver operating characteristic (ROC) curve, which is an RDD of (false positive rate, true positive rate) with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.
      Returns:
      (undocumented)
      See Also:
    • areaUnderROC

      public double areaUnderROC()
      Computes the area under the receiver operating characteristic (ROC) curve.
      Returns:
      (undocumented)
    • pr

      public RDD<scala.Tuple2<Object,Object>> pr()
      Returns the precision-recall curve, which is an RDD of (recall, precision), NOT (precision, recall), with (0.0, p) prepended to it, where p is the precision associated with the lowest recall on the curve.
      Returns:
      (undocumented)
      See Also:
    • areaUnderPR

      public double areaUnderPR()
      Computes the area under the precision-recall curve.
      Returns:
      (undocumented)
    • fMeasureByThreshold

      public RDD<scala.Tuple2<Object,Object>> fMeasureByThreshold(double beta)
      Returns the (threshold, F-Measure) curve.
      Parameters:
      beta - the beta factor in F-Measure computation.
      Returns:
      an RDD of (threshold, F-Measure) pairs.
      See Also:
    • fMeasureByThreshold

      public RDD<scala.Tuple2<Object,Object>> fMeasureByThreshold()
      Returns the (threshold, F-Measure) curve with beta = 1.0.
      Returns:
      (undocumented)
    • precisionByThreshold

      public RDD<scala.Tuple2<Object,Object>> precisionByThreshold()
      Returns the (threshold, precision) curve.
      Returns:
      (undocumented)
    • recallByThreshold

      public RDD<scala.Tuple2<Object,Object>> recallByThreshold()
      Returns the (threshold, recall) curve.
      Returns:
      (undocumented)