Package org.apache.spark.mllib.feature
Class HashingTF
Object
org.apache.spark.mllib.feature.HashingTF
- All Implemented Interfaces:
Serializable
,scala.Serializable
Maps a sequence of terms to their term frequencies using the hashing trick.
param: numFeatures number of features (default: 2^20^)
- See Also:
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionint
Returns the index of the input term.int
setBinary
(boolean value) If true, term frequency vector will be binary such that non-zero term counts will be set to 1 (default: false)setHashAlgorithm
(String value) Set the hash algorithm used when mapping term to integer.Transforms the input document into a sparse term frequency vector (Java version).Transforms the input document to term frequency vectors (Java version).Transforms the input document to term frequency vectors.transform
(scala.collection.Iterable<?> document) Transforms the input document into a sparse term frequency vector.
-
Constructor Details
-
HashingTF
public HashingTF(int numFeatures) -
HashingTF
public HashingTF()
-
-
Method Details
-
numFeatures
public int numFeatures() -
setBinary
If true, term frequency vector will be binary such that non-zero term counts will be set to 1 (default: false)- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
setHashAlgorithm
Set the hash algorithm used when mapping term to integer. (default: murmur3)- Parameters:
value
- (undocumented)- Returns:
- (undocumented)
-
indexOf
Returns the index of the input term.- Parameters:
term
- (undocumented)- Returns:
- (undocumented)
-
transform
Transforms the input document into a sparse term frequency vector.- Parameters:
document
- (undocumented)- Returns:
- (undocumented)
-
transform
Transforms the input document into a sparse term frequency vector (Java version).- Parameters:
document
- (undocumented)- Returns:
- (undocumented)
-
transform
Transforms the input document to term frequency vectors.- Parameters:
dataset
- (undocumented)- Returns:
- (undocumented)
-
transform
Transforms the input document to term frequency vectors (Java version).- Parameters:
dataset
- (undocumented)- Returns:
- (undocumented)
-