Modifier and Type | Method and Description |
---|---|
scala.Tuple2<int[],double[]>[] |
describeTopics(int maxTermsPerTopic)
Return the topics described by weighted terms.
|
Vector |
docConcentration()
Concentration parameter (commonly named "alpha") for the prior placed on documents'
distributions over topics ("theta").
|
JavaRDD<scala.Tuple3<Long,int[],int[]>> |
javaTopicAssignments() |
JavaPairRDD<Long,Vector> |
javaTopicDistributions()
Java-friendly version of
topicDistributions |
JavaRDD<scala.Tuple3<Long,int[],double[]>> |
javaTopTopicsPerDocument(int k)
Java-friendly version of
topTopicsPerDocument |
int |
k()
Number of topics
|
static DistributedLDAModel |
load(SparkContext sc,
String path) |
double |
logLikelihood() |
double |
logPrior() |
void |
save(SparkContext sc,
String path)
Save this model to the given path.
|
LocalLDAModel |
toLocal()
Convert model to a local model.
|
scala.Tuple2<long[],double[]>[] |
topDocumentsPerTopic(int maxDocumentsPerTopic)
Return the top documents for each topic
|
RDD<scala.Tuple3<Object,int[],int[]>> |
topicAssignments() |
double |
topicConcentration()
Concentration parameter (commonly named "beta" or "eta") for the prior placed on topics'
distributions over terms.
|
RDD<scala.Tuple2<Object,Vector>> |
topicDistributions()
For each document in the training set, return the distribution over topics for that document
("theta_doc").
|
Matrix |
topicsMatrix()
Inferred topics, where each topic is represented by a distribution over terms.
|
RDD<scala.Tuple3<Object,int[],double[]>> |
topTopicsPerDocument(int k)
For each document, return the top k weighted topics for that document and their weights.
|
int |
vocabSize()
Vocabulary size (number of terms or terms in the vocabulary)
|
describeTopics
public static DistributedLDAModel load(SparkContext sc, String path)
public int k()
LDAModel
public int vocabSize()
LDAModel
public Vector docConcentration()
LDAModel
This is the parameter to a Dirichlet distribution.
docConcentration
in class LDAModel
public double topicConcentration()
LDAModel
This is the parameter to a symmetric Dirichlet distribution.
topicConcentration
in class LDAModel
public LocalLDAModel toLocal()
public Matrix topicsMatrix()
LDAModel
topicsMatrix
in class LDAModel
public scala.Tuple2<int[],double[]>[] describeTopics(int maxTermsPerTopic)
LDAModel
describeTopics
in class LDAModel
maxTermsPerTopic
- Maximum number of terms to collect for each topic.public scala.Tuple2<long[],double[]>[] topDocumentsPerTopic(int maxDocumentsPerTopic)
maxDocumentsPerTopic
- Maximum number of documents to collect for each topic.public RDD<scala.Tuple3<Object,int[],int[]>> topicAssignments()
public JavaRDD<scala.Tuple3<Long,int[],int[]>> javaTopicAssignments()
public double logLikelihood()
public double logPrior()
public RDD<scala.Tuple2<Object,Vector>> topicDistributions()
public JavaPairRDD<Long,Vector> javaTopicDistributions()
topicDistributions
public RDD<scala.Tuple3<Object,int[],double[]>> topTopicsPerDocument(int k)
k
- (undocumented)public JavaRDD<scala.Tuple3<Long,int[],double[]>> javaTopTopicsPerDocument(int k)
topTopicsPerDocument
k
- (undocumented)public void save(SparkContext sc, String path)
Saveable
This saves: - human-readable (JSON) model metadata to path/metadata/ - Parquet formatted data to path/data/
The model may be loaded using Loader.load
.
sc
- Spark context used to save model data.path
- Path specifying the directory in which to save this model.
If the directory already exists, this method throws an exception.