org.apache.spark.ml

Pipeline

class Pipeline extends Estimator[PipelineModel]

:: AlphaComponent :: A simple pipeline, which acts as an estimator. A Pipeline consists of a sequence of stages, each of which is either an Estimator or a Transformer. When Pipeline#fit is called, the stages are executed in order. If a stage is an Estimator, its Estimator#fit method will be called on the input dataset to fit a model. Then the model, which is a transformer, will be used to transform the dataset as the input to the next stage. If a stage is a Transformer, its Transformer#transform method will be called to produce the dataset for the next stage. The fitted model from a Pipeline is an PipelineModel, which consists of fitted models and transformers, corresponding to the pipeline stages. If there are no stages, the pipeline acts as an identity transformer.

Annotations
@AlphaComponent()
Linear Supertypes
Estimator[PipelineModel], Params, Identifiable, PipelineStage, Logging, Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. Pipeline
  2. Estimator
  3. Params
  4. Identifiable
  5. PipelineStage
  6. Logging
  7. Serializable
  8. Serializable
  9. AnyRef
  10. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new Pipeline()

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. def addOutputColumn(schema: StructType, colName: String, dataType: DataType): StructType

    Attributes
    protected
    Definition Classes
    Params
  7. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  8. def checkInputColumn(schema: StructType, colName: String, dataType: DataType): Unit

    Check whether the given schema contains an input column.

    Check whether the given schema contains an input column.

    colName

    Parameter name for the input column.

    dataType

    SQL DataType of the input column.

    Attributes
    protected
    Definition Classes
    Params
  9. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  10. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  11. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  12. def explainParams(): String

    Returns the documentation of all params.

    Returns the documentation of all params.

    Definition Classes
    Params
  13. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  14. def fit(dataset: DataFrame, paramMap: ParamMap): PipelineModel

    Fits the pipeline to the input dataset with additional parameters.

    Fits the pipeline to the input dataset with additional parameters. If a stage is an Estimator, its Estimator#fit method will be called on the input dataset to fit a model. Then the model, which is a transformer, will be used to transform the dataset as the input to the next stage. If a stage is a Transformer, its Transformer#transform method will be called to produce the dataset for the next stage. The fitted model from a Pipeline is an PipelineModel, which consists of fitted models and transformers, corresponding to the pipeline stages. If there are no stages, the output model acts as an identity transformer.

    dataset

    input dataset

    paramMap

    parameter map

    returns

    fitted pipeline

    Definition Classes
    PipelineEstimator
  15. def fit(dataset: DataFrame, paramMaps: Array[ParamMap]): Seq[PipelineModel]

    Fits multiple models to the input data with multiple sets of parameters.

    Fits multiple models to the input data with multiple sets of parameters. The default implementation uses a for loop on each parameter map. Subclasses could overwrite this to optimize multi-model training.

    dataset

    input dataset

    paramMaps

    An array of parameter maps. These values override any specified in this Estimator's embedded ParamMap.

    returns

    fitted models, matching the input parameter maps

    Definition Classes
    Estimator
  16. def fit(dataset: DataFrame, paramPairs: ParamPair[_]*): PipelineModel

    Fits a single model to the input data with optional parameters.

    Fits a single model to the input data with optional parameters.

    dataset

    input dataset

    paramPairs

    Optional list of param pairs. These values override any specified in this Estimator's embedded ParamMap.

    returns

    fitted model

    Definition Classes
    Estimator
    Annotations
    @varargs()
  17. def get[T](param: Param[T]): T

    Gets the value of a parameter in the embedded param map.

    Gets the value of a parameter in the embedded param map.

    Attributes
    protected
    Definition Classes
    Params
  18. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  19. def getStages: Array[PipelineStage]

  20. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  21. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  22. def isSet(param: Param[_]): Boolean

    Checks whether a param is explicitly set.

    Checks whether a param is explicitly set.

    Definition Classes
    Params
  23. def isTraceEnabled(): Boolean

    Attributes
    protected
    Definition Classes
    Logging
  24. def log: Logger

    Attributes
    protected
    Definition Classes
    Logging
  25. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  26. def logDebug(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  27. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  28. def logError(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  29. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  30. def logInfo(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  31. def logName: String

    Attributes
    protected
    Definition Classes
    Logging
  32. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  33. def logTrace(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  34. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  35. def logWarning(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  36. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  37. final def notify(): Unit

    Definition Classes
    AnyRef
  38. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  39. val paramMap: ParamMap

    Internal param map.

    Internal param map.

    Attributes
    protected
    Definition Classes
    Params
  40. def params: Array[Param[_]]

    Returns all params.

    Returns all params.

    Definition Classes
    Params
  41. def set[T](param: Param[T], value: T): Pipeline.this.type

    Sets a parameter in the embedded param map.

    Sets a parameter in the embedded param map.

    Attributes
    protected
    Definition Classes
    Params
  42. def setStages(value: Array[PipelineStage]): Pipeline.this.type

  43. val stages: Param[Array[PipelineStage]]

    param for pipeline stages

  44. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  45. def toString(): String

    Definition Classes
    AnyRef → Any
  46. def transformSchema(schema: StructType, paramMap: ParamMap): StructType

    :: DeveloperApi ::

    :: DeveloperApi ::

    Derives the output schema from the input schema and parameters. The schema describes the columns and types of the data.

    schema

    Input schema to this stage

    paramMap

    Parameters passed to this stage

    returns

    Output schema from this stage

    Definition Classes
    PipelinePipelineStage
  47. def transformSchema(schema: StructType, paramMap: ParamMap, logging: Boolean): StructType

    Derives the output schema from the input schema and parameters, optionally with logging.

    Derives the output schema from the input schema and parameters, optionally with logging.

    Attributes
    protected
    Definition Classes
    PipelineStage
  48. def validate(): Unit

    Validates parameter values stored internally.

    Validates parameter values stored internally. Raise an exception if any parameter value is invalid.

    Definition Classes
    Params
  49. def validate(paramMap: ParamMap): Unit

    Validates parameter values stored internally plus the input parameter map.

    Validates parameter values stored internally plus the input parameter map. Raises an exception if any parameter is invalid.

    Definition Classes
    Params
  50. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  51. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  52. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Estimator[PipelineModel]

Inherited from Params

Inherited from Identifiable

Inherited from PipelineStage

Inherited from Logging

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Members