Class OnlineLDAOptimizer
Object
org.apache.spark.mllib.clustering.OnlineLDAOptimizer
- All Implemented Interfaces:
org.apache.spark.internal.Logging
,LDAOptimizer
public final class OnlineLDAOptimizer
extends Object
implements LDAOptimizer, org.apache.spark.internal.Logging
An online optimizer for LDA. The Optimizer implements the Online variational Bayes LDA
algorithm, which processes a subset of the corpus on each iteration, and updates the term-topic
distribution adaptively.
Original Online LDA paper: Hoffman, Blei and Bach, "Online Learning for Latent Dirichlet Allocation." NIPS, 2010.
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.SparkShellLoggingFilter
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptiondouble
getKappa()
Learning rate: exponential decay ratedouble
Mini-batch fraction, which sets the fraction of document sampled and used in each iterationboolean
Optimize docConcentration, indicates whether docConcentration (Dirichlet parameter for document-topic distribution) will be optimized during training.double
getTau0()
A (positive) learning parameter that downweights early iterations.setKappa
(double kappa) Learning rate: exponential decay rate---should be between (0.5, 1.0] to guarantee asymptotic convergence.setMiniBatchFraction
(double miniBatchFraction) Mini-batch fraction in (0, 1], which sets the fraction of document sampled and used in each iteration.setOptimizeDocConcentration
(boolean optimizeDocConcentration) Sets whether to optimize docConcentration parameter during training.setTau0
(double tau0) A (positive) learning parameter that downweights early iterations.Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq
-
Constructor Details
-
OnlineLDAOptimizer
public OnlineLDAOptimizer()
-
-
Method Details
-
getTau0
public double getTau0()A (positive) learning parameter that downweights early iterations. Larger values make early iterations count less.- Returns:
- (undocumented)
-
setTau0
A (positive) learning parameter that downweights early iterations. Larger values make early iterations count less. Default: 1024, following the original Online LDA paper.- Parameters:
tau0
- (undocumented)- Returns:
- (undocumented)
-
getKappa
public double getKappa()Learning rate: exponential decay rate- Returns:
- (undocumented)
-
setKappa
Learning rate: exponential decay rate---should be between (0.5, 1.0] to guarantee asymptotic convergence. Default: 0.51, based on the original Online LDA paper.- Parameters:
kappa
- (undocumented)- Returns:
- (undocumented)
-
getMiniBatchFraction
public double getMiniBatchFraction()Mini-batch fraction, which sets the fraction of document sampled and used in each iteration- Returns:
- (undocumented)
-
setMiniBatchFraction
Mini-batch fraction in (0, 1], which sets the fraction of document sampled and used in each iteration.- Parameters:
miniBatchFraction
- (undocumented)- Returns:
- (undocumented)
- Note:
- This should be adjusted in synch with
LDA.setMaxIterations()
so the entire corpus is used. Specifically, set both so that maxIterations * miniBatchFraction is at least 1.Default: 0.05, i.e., 5% of total documents.
-
getOptimizeDocConcentration
public boolean getOptimizeDocConcentration()Optimize docConcentration, indicates whether docConcentration (Dirichlet parameter for document-topic distribution) will be optimized during training.- Returns:
- (undocumented)
-
setOptimizeDocConcentration
Sets whether to optimize docConcentration parameter during training.Default: false
- Parameters:
optimizeDocConcentration
- (undocumented)- Returns:
- (undocumented)
-