Package org.apache.spark.mllib.feature
Class HashingTF
Object
org.apache.spark.mllib.feature.HashingTF
- All Implemented Interfaces:
Serializable
Maps a sequence of terms to their term frequencies using the hashing trick.
param: numFeatures number of features (default: 2^20^)
- See Also:
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionintReturns the index of the input term.intsetBinary(boolean value) If true, term frequency vector will be binary such that non-zero term counts will be set to 1 (default: false)setHashAlgorithm(String value) Set the hash algorithm used when mapping term to integer.Transforms the input document into a sparse term frequency vector (Java version).Transforms the input document to term frequency vectors (Java version).Transforms the input document to term frequency vectors.Transforms the input document into a sparse term frequency vector.
-
Constructor Details
-
HashingTF
public HashingTF(int numFeatures) -
HashingTF
public HashingTF()
-
-
Method Details
-
numFeatures
public int numFeatures() -
setBinary
If true, term frequency vector will be binary such that non-zero term counts will be set to 1 (default: false)- Parameters:
value- (undocumented)- Returns:
- (undocumented)
-
setHashAlgorithm
Set the hash algorithm used when mapping term to integer. (default: murmur3)- Parameters:
value- (undocumented)- Returns:
- (undocumented)
-
indexOf
Returns the index of the input term.- Parameters:
term- (undocumented)- Returns:
- (undocumented)
-
transform
Transforms the input document into a sparse term frequency vector.- Parameters:
document- (undocumented)- Returns:
- (undocumented)
-
transform
Transforms the input document into a sparse term frequency vector (Java version).- Parameters:
document- (undocumented)- Returns:
- (undocumented)
-
transform
Transforms the input document to term frequency vectors.- Parameters:
dataset- (undocumented)- Returns:
- (undocumented)
-
transform
Transforms the input document to term frequency vectors (Java version).- Parameters:
dataset- (undocumented)- Returns:
- (undocumented)
-