Class Tokenizer

All Implemented Interfaces:
Serializable, org.apache.spark.internal.Logging, Params, HasInputCol, HasOutputCol, DefaultParamsWritable, Identifiable, MLWritable, scala.Serializable

public class Tokenizer extends UnaryTransformer<String,scala.collection.Seq<String>,Tokenizer> implements DefaultParamsWritable
A tokenizer that converts the input string to lowercase and then splits it by white spaces.

  • Constructor Details

    • Tokenizer

      public Tokenizer(String uid)
    • Tokenizer

      public Tokenizer()
  • Method Details

    • load

      public static Tokenizer load(String path)
    • read

      public static MLReader<T> read()
    • uid

      public String uid()
      Description copied from interface: Identifiable
      An immutable unique ID for the object and its derivatives.
      uid in interface Identifiable
    • copy

      public Tokenizer copy(ParamMap extra)
      Description copied from interface: Params
      Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See defaultCopy().
      copy in interface Params
      copy in class UnaryTransformer<String,scala.collection.Seq<String>,Tokenizer>
      extra - (undocumented)