Class Word2VecModel

Object
org.apache.spark.mllib.feature.Word2VecModel
All Implemented Interfaces:
Serializable, Saveable, scala.Serializable

public class Word2VecModel extends Object implements scala.Serializable, Saveable
Word2Vec model param: wordIndex maps each word to an index, which can retrieve the corresponding vector from wordVectors param: wordVectors array of length numWords * vectorSize, vector corresponding to the word mapped with index i can be retrieved by the slice (i * vectorSize, i * vectorSize + vectorSize)
See Also:
  • Constructor Details

    • Word2VecModel

      public Word2VecModel(scala.collection.immutable.Map<String,float[]> model)
  • Method Details

    • load

      public static Word2VecModel load(SparkContext sc, String path)
    • save

      public void save(SparkContext sc, String path)
      Description copied from interface: Saveable
      Save this model to the given path.

      This saves: - human-readable (JSON) model metadata to path/metadata/ - Parquet formatted data to path/data/

      The model may be loaded using Loader.load.

      Specified by:
      save in interface Saveable
      Parameters:
      sc - Spark context used to save model data.
      path - Path specifying the directory in which to save this model. If the directory already exists, this method throws an exception.
    • transform

      public Vector transform(String word)
      Transforms a word to its vector representation
      Parameters:
      word - a word
      Returns:
      vector representation of word
    • findSynonyms

      public scala.Tuple2<String,Object>[] findSynonyms(String word, int num)
      Find synonyms of a word; do not include the word itself in results.
      Parameters:
      word - a word
      num - number of synonyms to find
      Returns:
      array of (word, cosineSimilarity)
    • findSynonyms

      public scala.Tuple2<String,Object>[] findSynonyms(Vector vector, int num)
      Find synonyms of the vector representation of a word, possibly including any words in the model vocabulary whose vector representation is the supplied vector.
      Parameters:
      vector - vector representation of a word
      num - number of synonyms to find
      Returns:
      array of (word, cosineSimilarity)
    • getVectors

      public scala.collection.immutable.Map<String,float[]> getVectors()
      Returns a map of words to their vector representations.
      Returns:
      (undocumented)