Object
org.apache.spark.mllib.optimization.LBFGS
All Implemented Interfaces:
Serializable, org.apache.spark.internal.Logging, Optimizer, scala.Serializable

public class LBFGS extends Object implements Optimizer, org.apache.spark.internal.Logging
Class used to solve an optimization problem using Limited-memory BFGS. Reference: Wikipedia on Limited-memory BFGS param: gradient Gradient function to be used. param: updater Updater to be used to update weights after every iteration.
See Also:
  • Nested Class Summary

    Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging

    org.apache.spark.internal.Logging.SparkShellLoggingFilter
  • Constructor Summary

    Constructors
    Constructor
    Description
    LBFGS(Gradient gradient, Updater updater)
     
  • Method Summary

    Modifier and Type
    Method
    Description
    optimize(RDD<scala.Tuple2<Object,Vector>> data, Vector initialWeights)
    Solve the provided convex optimization problem.
    scala.Tuple2<Vector,double[]>
    optimizeWithLossReturned(RDD<scala.Tuple2<Object,Vector>> data, Vector initialWeights)
     
    static org.slf4j.Logger
     
    static void
     
    static scala.Tuple2<Vector,double[]>
    runLBFGS(RDD<scala.Tuple2<Object,Vector>> data, Gradient gradient, Updater updater, int numCorrections, double convergenceTol, int maxNumIterations, double regParam, Vector initialWeights)
    Run Limited-memory BFGS (L-BFGS) in parallel.
    setConvergenceTol(double tolerance)
    Set the convergence tolerance of iterations for L-BFGS.
    Set the gradient function (of the loss function of one single data example) to be used for L-BFGS.
    setNumCorrections(int corrections)
    Set the number of corrections used in the LBFGS update.
    setNumIterations(int iters)
    Set the maximal number of iterations for L-BFGS.
    setRegParam(double regParam)
    Set the regularization parameter.
    Set the updater function to actually perform a gradient step in a given direction.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface org.apache.spark.internal.Logging

    initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq
  • Constructor Details

  • Method Details

    • runLBFGS

      public static scala.Tuple2<Vector,double[]> runLBFGS(RDD<scala.Tuple2<Object,Vector>> data, Gradient gradient, Updater updater, int numCorrections, double convergenceTol, int maxNumIterations, double regParam, Vector initialWeights)
      Run Limited-memory BFGS (L-BFGS) in parallel. Averaging the subgradients over different partitions is performed using one standard spark map-reduce in each iteration.

      Parameters:
      data - - Input data for L-BFGS. RDD of the set of data examples, each of the form (label, [feature values]).
      gradient - - Gradient object (used to compute the gradient of the loss function of one single data example)
      updater - - Updater function to actually perform a gradient step in a given direction.
      numCorrections - - The number of corrections used in the L-BFGS update.
      convergenceTol - - The convergence tolerance of iterations for L-BFGS which is must be nonnegative. Lower values are less tolerant and therefore generally cause more iterations to be run.
      maxNumIterations - - Maximal number of iterations that L-BFGS can be run.
      regParam - - Regularization parameter

      initialWeights - (undocumented)
      Returns:
      A tuple containing two elements. The first element is a column matrix containing weights for every feature, and the second element is an array containing the loss computed for every iteration.
    • org$apache$spark$internal$Logging$$log_

      public static org.slf4j.Logger org$apache$spark$internal$Logging$$log_()
    • org$apache$spark$internal$Logging$$log__$eq

      public static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1)
    • setNumCorrections

      public LBFGS setNumCorrections(int corrections)
      Set the number of corrections used in the LBFGS update. Default 10. Values of numCorrections less than 3 are not recommended; large values of numCorrections will result in excessive computing time. numCorrections must be positive, and values from 4 to 9 are generally recommended.
      Parameters:
      corrections - (undocumented)
      Returns:
      (undocumented)
    • setConvergenceTol

      public LBFGS setConvergenceTol(double tolerance)
      Set the convergence tolerance of iterations for L-BFGS. Default 1E-6. Smaller value will lead to higher accuracy with the cost of more iterations. This value must be nonnegative. Lower convergence values are less tolerant and therefore generally cause more iterations to be run.
      Parameters:
      tolerance - (undocumented)
      Returns:
      (undocumented)
    • setNumIterations

      public LBFGS setNumIterations(int iters)
      Set the maximal number of iterations for L-BFGS. Default 100.
      Parameters:
      iters - (undocumented)
      Returns:
      (undocumented)
    • setRegParam

      public LBFGS setRegParam(double regParam)
      Set the regularization parameter. Default 0.0.
      Parameters:
      regParam - (undocumented)
      Returns:
      (undocumented)
    • setGradient

      public LBFGS setGradient(Gradient gradient)
      Set the gradient function (of the loss function of one single data example) to be used for L-BFGS.
      Parameters:
      gradient - (undocumented)
      Returns:
      (undocumented)
    • setUpdater

      public LBFGS setUpdater(Updater updater)
      Set the updater function to actually perform a gradient step in a given direction. The updater is responsible to perform the update from the regularization term as well, and therefore determines what kind or regularization is used, if any.
      Parameters:
      updater - (undocumented)
      Returns:
      (undocumented)
    • optimize

      public Vector optimize(RDD<scala.Tuple2<Object,Vector>> data, Vector initialWeights)
      Description copied from interface: Optimizer
      Solve the provided convex optimization problem.
      Specified by:
      optimize in interface Optimizer
      Parameters:
      data - (undocumented)
      initialWeights - (undocumented)
      Returns:
      (undocumented)
    • optimizeWithLossReturned

      public scala.Tuple2<Vector,double[]> optimizeWithLossReturned(RDD<scala.Tuple2<Object,Vector>> data, Vector initialWeights)