Class IDFModel

Object
org.apache.spark.mllib.feature.IDFModel
All Implemented Interfaces:
Serializable, scala.Serializable

public class IDFModel extends Object implements scala.Serializable
Represents an IDF model that can transform term frequency vectors.
See Also:
  • Method Details

    • idf

      public Vector idf()
    • docFreq

      public long[] docFreq()
    • numDocs

      public long numDocs()
    • transform

      public RDD<Vector> transform(RDD<Vector> dataset)
      Transforms term frequency (TF) vectors to TF-IDF vectors.

      If minDocFreq was set for the IDF calculation, the terms which occur in fewer than minDocFreq documents will have an entry of 0.

      Parameters:
      dataset - an RDD of term frequency vectors
      Returns:
      an RDD of TF-IDF vectors
    • transform

      public Vector transform(Vector v)
      Transforms a term frequency (TF) vector to a TF-IDF vector

      Parameters:
      v - a term frequency vector
      Returns:
      a TF-IDF vector
    • transform

      public JavaRDD<Vector> transform(JavaRDD<Vector> dataset)
      Transforms term frequency (TF) vectors to TF-IDF vectors (Java version).
      Parameters:
      dataset - a JavaRDD of term frequency vectors
      Returns:
      a JavaRDD of TF-IDF vectors