Class EMLDAOptimizer

Object
org.apache.spark.mllib.clustering.EMLDAOptimizer
All Implemented Interfaces:
LDAOptimizer

public final class EMLDAOptimizer extends Object implements LDAOptimizer
Optimizer for EM algorithm which stores data + parameter graph, plus algorithm parameters.

Currently, the underlying implementation uses Expectation-Maximization (EM), implemented according to the Asuncion et al. (2009) paper referenced below.

References: - Original LDA paper (journal version): Blei, Ng, and Jordan. "Latent Dirichlet Allocation." JMLR, 2003. - This class implements their "smoothed" LDA model. - Paper which clearly explains several algorithms, including EM: Asuncion, Welling, Smyth, and Teh. "On Smoothing and Inference for Topic Models." UAI, 2009.

  • Constructor Details

    • EMLDAOptimizer

      public EMLDAOptimizer()
  • Method Details

    • getKeepLastCheckpoint

      public boolean getKeepLastCheckpoint()
      If using checkpointing, this indicates whether to keep the last checkpoint (vs clean up).
      Returns:
      (undocumented)
    • setKeepLastCheckpoint

      public EMLDAOptimizer setKeepLastCheckpoint(boolean keepLastCheckpoint)
      If using checkpointing, this indicates whether to keep the last checkpoint (vs clean up). Deleting the checkpoint can cause failures if a data partition is lost, so set this bit with care.

      Default: true

      Parameters:
      keepLastCheckpoint - (undocumented)
      Returns:
      (undocumented)
      Note:
      Checkpoints will be cleaned up via reference counting, regardless.