Package org.apache.spark.mllib.feature
Class Word2VecModel
Object
org.apache.spark.mllib.feature.Word2VecModel
- All Implemented Interfaces:
Serializable
,Saveable
,scala.Serializable
Word2Vec model
param: wordIndex maps each word to an index, which can retrieve the corresponding
vector from wordVectors
param: wordVectors array of length numWords * vectorSize, vector corresponding
to the word mapped with index i can be retrieved by the slice
(i * vectorSize, i * vectorSize + vectorSize)
- See Also:
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionfindSynonyms
(String word, int num) Find synonyms of a word; do not include the word itself in results.findSynonyms
(Vector vector, int num) Find synonyms of the vector representation of a word, possibly including any words in the model vocabulary whose vector representation is the supplied vector.scala.collection.immutable.Map<String,
float[]> Returns a map of words to their vector representations.static Word2VecModel
load
(SparkContext sc, String path) void
save
(SparkContext sc, String path) Save this model to the given path.Transforms a word to its vector representation
-
Constructor Details
-
Word2VecModel
-
-
Method Details
-
load
-
save
Description copied from interface:Saveable
Save this model to the given path.This saves: - human-readable (JSON) model metadata to path/metadata/ - Parquet formatted data to path/data/
The model may be loaded using
Loader.load
. -
transform
Transforms a word to its vector representation- Parameters:
word
- a word- Returns:
- vector representation of word
-
findSynonyms
Find synonyms of a word; do not include the word itself in results.- Parameters:
word
- a wordnum
- number of synonyms to find- Returns:
- array of (word, cosineSimilarity)
-
findSynonyms
Find synonyms of the vector representation of a word, possibly including any words in the model vocabulary whose vector representation is the supplied vector.- Parameters:
vector
- vector representation of a wordnum
- number of synonyms to find- Returns:
- array of (word, cosineSimilarity)
-
getVectors
Returns a map of words to their vector representations.- Returns:
- (undocumented)
-