|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
Object org.apache.spark.mllib.feature.IDF
public class IDF
:: Experimental ::
Inverse document frequency (IDF).
The standard formulation is used: idf = log((m + 1) / (d(t) + 1))
, where m
is the total
number of documents and d(t)
is the number of documents that contain term t
.
This implementation supports filtering out terms which do not appear in a minimum number
of documents (controlled by the variable minDocFreq
). For terms that are not in
at least minDocFreq
documents, the IDF is found as 0, resulting in TF-IDFs of 0.
param: minDocFreq minimum of documents in which a term should appear for filtering
Nested Class Summary | |
---|---|
static class |
IDF.DocumentFrequencyAggregator
Document frequency aggregator. |
Constructor Summary | |
---|---|
IDF()
|
|
IDF(int minDocFreq)
|
Method Summary | |
---|---|
IDFModel |
fit(JavaRDD<Vector> dataset)
Computes the inverse document frequency. |
IDFModel |
fit(RDD<Vector> dataset)
Computes the inverse document frequency. |
int |
minDocFreq()
|
Methods inherited from class Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public IDF(int minDocFreq)
public IDF()
Method Detail |
---|
public int minDocFreq()
public IDFModel fit(RDD<Vector> dataset)
dataset
- an RDD of term frequency vectors
public IDFModel fit(JavaRDD<Vector> dataset)
dataset
- a JavaRDD of term frequency vectors
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |