|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
Object org.apache.spark.ml.PipelineStage org.apache.spark.ml.Estimator<PipelineModel> org.apache.spark.ml.Pipeline
public class Pipeline
:: Experimental ::
A simple pipeline, which acts as an estimator. A Pipeline consists of a sequence of stages, each
of which is either an Estimator
or a Transformer
. When fit(org.apache.spark.sql.DataFrame)
is called, the
stages are executed in order. If a stage is an Estimator
, its Estimator.fit(org.apache.spark.sql.DataFrame, org.apache.spark.ml.param.ParamPair>, org.apache.spark.ml.param.ParamPair>...)
method will
be called on the input dataset to fit a model. Then the model, which is a transformer, will be
used to transform the dataset as the input to the next stage. If a stage is a Transformer
,
its Transformer.transform(org.apache.spark.sql.DataFrame, org.apache.spark.ml.param.ParamPair>, org.apache.spark.ml.param.ParamPair>...)
method will be called to produce the dataset for the next stage.
The fitted model from a Pipeline
is an PipelineModel
, which consists of fitted models and
transformers, corresponding to the pipeline stages. If there are no stages, the pipeline acts as
an identity transformer.
Constructor Summary | |
---|---|
Pipeline()
|
|
Pipeline(String uid)
|
Method Summary | |
---|---|
Pipeline |
copy(ParamMap extra)
Creates a copy of this instance with the same UID and some extra params. |
PipelineModel |
fit(DataFrame dataset)
Fits the pipeline to the input dataset with additional parameters. |
PipelineStage[] |
getStages()
|
Pipeline |
setStages(PipelineStage[] value)
|
Param<PipelineStage[]> |
stages()
param for pipeline stages |
StructType |
transformSchema(StructType schema)
:: DeveloperApi :: |
String |
uid()
|
void |
validateParams()
Validates parameter values stored internally. |
Methods inherited from class org.apache.spark.ml.Estimator |
---|
fit, fit, fit, fit |
Methods inherited from class Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface org.apache.spark.ml.param.Params |
---|
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, setDefault, shouldOwn |
Methods inherited from interface org.apache.spark.Logging |
---|
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning |
Constructor Detail |
---|
public Pipeline(String uid)
public Pipeline()
Method Detail |
---|
public String uid()
public Param<PipelineStage[]> stages()
public Pipeline setStages(PipelineStage[] value)
public PipelineStage[] getStages()
public void validateParams()
Params
This only needs to check for interactions between parameters.
Parameter value checks which do not depend on other parameters are handled by
Param.validate()
. This method does not handle input/output column parameters;
those are checked during schema validation.
public PipelineModel fit(DataFrame dataset)
Estimator
, its Estimator.fit(org.apache.spark.sql.DataFrame, org.apache.spark.ml.param.ParamPair>, org.apache.spark.ml.param.ParamPair>...)
method will be called on the input dataset to fit a model.
Then the model, which is a transformer, will be used to transform the dataset as the input to
the next stage. If a stage is a Transformer
, its Transformer.transform(org.apache.spark.sql.DataFrame, org.apache.spark.ml.param.ParamPair>, org.apache.spark.ml.param.ParamPair>...)
method will be
called to produce the dataset for the next stage. The fitted model from a Pipeline
is an
PipelineModel
, which consists of fitted models and transformers, corresponding to the
pipeline stages. If there are no stages, the output model acts as an identity transformer.
fit
in class Estimator<PipelineModel>
dataset
- input dataset
public Pipeline copy(ParamMap extra)
Params
copy
in interface Params
copy
in class Estimator<PipelineModel>
extra
- (undocumented)
defaultCopy()
public StructType transformSchema(StructType schema)
PipelineStage
Derives the output schema from the input schema.
transformSchema
in class PipelineStage
schema
- (undocumented)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |