Package org.apache.spark.mllib.feature
Class IDFModel
Object
org.apache.spark.mllib.feature.IDFModel
- All Implemented Interfaces:
Serializable
,scala.Serializable
Represents an IDF model that can transform term frequency vectors.
- See Also:
-
Method Details
-
idf
-
docFreq
public long[] docFreq() -
numDocs
public long numDocs() -
transform
Transforms term frequency (TF) vectors to TF-IDF vectors.If
minDocFreq
was set for the IDF calculation, the terms which occur in fewer thanminDocFreq
documents will have an entry of 0.- Parameters:
dataset
- an RDD of term frequency vectors- Returns:
- an RDD of TF-IDF vectors
-
transform
Transforms a term frequency (TF) vector to a TF-IDF vector- Parameters:
v
- a term frequency vector- Returns:
- a TF-IDF vector
-
transform
Transforms term frequency (TF) vectors to TF-IDF vectors (Java version).- Parameters:
dataset
- a JavaRDD of term frequency vectors- Returns:
- a JavaRDD of TF-IDF vectors
-