org.apache.spark.sql.pipelines.graph (Spark 4.1.0-preview1 JavaDoc)

package org.apache.spark.sql.pipelines.graph

Related Packages

Package

Description

org.apache.spark.sql.pipelines

org.apache.spark.sql.pipelines.common

org.apache.spark.sql.pipelines.logging

org.apache.spark.sql.pipelines.util
Class

Description

AllFlows

Used in full graph update to select all flows.

AllTables

Used in full graph updates to select all tables.

AppendOnceFlow

A Flow that reads source[s] completely and appends data to the target, just once.

BatchTableWrite

A `FlowExecution` that writes a batch `DataFrame` to a `Table`.

CircularDependencyException

Raised when there's a circular dependency in the current pipeline.

CompleteFlow

A Flow that declares exactly what data should be in the target table.

CoreDataflowNodeProcessor

Processor that is responsible for analyzing each flow and sort the nodes in topological order

DataflowGraph

DataflowGraph represents the core graph structure for Spark declarative pipelines.

DataflowGraphTransformer

Resolves the DataflowGraph by processing each node in the graph.

DataflowGraphTransformer.TransformNodeFailedException

Exception thrown when transforming a node in the graph fails with a non-retryable error.

DataflowGraphTransformer.TransformNodeFailedException$

DataflowGraphTransformer.TransformNodeRetryableException

Exception thrown when transforming a node in the graph fails because at least one of its dependencies weren't yet transformed.

DataflowGraphTransformer.TransformNodeRetryableException$

DatasetManager

DatasetManager is responsible for materializing tables in the catalog based on the given graph.

DatasetManager.TableMaterializationException

Wraps table materialization exceptions.

DatasetManager.TableMaterializationException$

ExecutionResult

A flow's execution may complete for two reasons: 1.

ExecutionResult.FINISHED$

ExecutionResult.STOPPED$

FailureStoppingFlow

Indicates that there was a failure while stopping the flow.

FailureStoppingOperation

Abstract class used to identify failures related to failures stopping an operation/timeouts.

Flow

A Flow is a node of data transformation in a dataflow graph.

FlowAnalysis

FlowExecution

A `FlowExecution` specifies how to execute a flow and manages its execution.

FlowFilter

Specifies how we should filter Flows.

FlowFunction

A wrapper for the lambda function that defines a Flow.

FlowFunctionResult

Holds the DataFrame returned by a FlowFunction along with the inputs used to construct it.

FlowNode

param: identifier The identifier of the flow.

FlowPlanner

Plans execution of Flows in a DataflowGraph by converting Flows into 'FlowExecution's.

FlowResolver

FlowsForTables

Used in partial graph updates to select flows that flow to "selectedTables".

GraphElement

An element in a DataflowGraph.

GraphElementTypeUtils

GraphErrors

Collection of errors that can be thrown during graph resolution / analysis.

GraphExecution

GraphExecution.FlowExecutionAction

GraphExecution.FlowExecutionStopReason

Represents the reason why a flow execution should be stopped.

GraphExecution.RetryFlowExecution$

Indicates that the flow execution should be retried.

GraphExecution.StopFlowExecution

Indicates that the flow execution should be stopped with a specific reason.

GraphExecution.StopFlowExecution$

GraphFilter<E>

Specifies how we should filter Graph elements.

GraphIdentifierManager

Responsible for properly qualify the identifiers for datasets inside or referenced by the dataflow graph.

GraphIdentifierManager.DatasetIdentifier

Represents the identifier for a dataset that is defined or referenced in a pipeline.

GraphIdentifierManager.ExternalDatasetIdentifier

Represents the identifier for a dataset that is external to the current pipeline.

GraphIdentifierManager.ExternalDatasetIdentifier$

GraphIdentifierManager.InternalDatasetIdentifier

Represents the identifier for a dataset that is defined by the current pipeline.

GraphIdentifierManager.InternalDatasetIdentifier$

GraphOperations

GraphRegistrationContext

A mutable context for registering tables, views, and flows in a dataflow graph.

GraphRegistrationContext.DatasetType

GraphValidations

Validations performed on a `DataflowGraph`.

IdentifierHelper

Input

Specifies an input that can be referenced by another Dataset's query.

LoadTableException

Exception raised when a flow fails to read from a table defined within the pipeline

NoFlows

Used to specify that no flows should be refreshed.

NoTables

Used to select no tables.

Output

Represents a node in a DataflowGraph that can be written to by a Flow.

PartitionHelper

PersistedView

Representing a persisted View in a DataflowGraph.

PipelineExecution

Executes a DataflowGraph by resolving the graph, materializing datasets, and running the flows.

PipelinesErrors

PipelinesTableProperties

Interface for validating and accessing Pipeline-specific table properties.

PipelineTableProperty<T>

PipelineUpdateContext

PipelineUpdateContextImpl

An implementation of the PipelineUpdateContext trait used in production.

QueryContext

Contains the catalog and database context information for query execution.

QueryExecutionFailure

Indicates that run has failed due to a query execution failure.

QueryOrigin

Records information used to track the provenance of a given query to user code.

QueryOrigin.ExceptionHelpers

ResolutionCompletedFlow

A Flow whose flow function has been invoked, meaning either: - Its output schema and dependencies are known.

ResolutionFailedFlow

A Flow whose flow function has failed to resolve.

ResolvedFlow

A Flow whose flow function has successfully resolved.

ResolvedInput

A wrapper for a resolved internal input that includes the alias provided by the user.

RunCompletion

Indicates that a triggered run has successfully completed execution.

RunFailure

Indicates that an run entered the failed state..

RunTerminationException

Helper exception class that indicates that a run has to be terminated and tracks the associated termination reason.

RunTerminationReason

SomeTables

Used in partial graph updates to select "selectedTables".

SqlGraphElementRegistrationException

SqlGraphRegistrationContext

SQL statement processor context.

SqlGraphRegistrationContext.SqlQueryPlanWithOrigin

Class that holds the logical plan and query origin parsed from a SQL statement.

SqlGraphRegistrationContext.SqlQueryPlanWithOrigin$

SqlGraphRegistrationContextState

Data class for all state that is accumulated while processing a particular SqlGraphRegistrationContext.

StreamingFlow

A Flow that represents stateful movement of data to some target.

StreamingFlowExecution

A 'FlowExecution' that processes data statefully using Structured Streaming.

StreamingTableWrite

A `StreamingFlowExecution` that writes a streaming `DataFrame` to a `Table`.

Table

A table representing a materialized dataset in a DataflowGraph.

TableFilter

Specifies how we should filter Tables.

TableInput

A type of Input where data is loaded from a table.

TemporaryView

Representing a temporary View in a DataflowGraph.

TriggeredFailureInfo

TriggeredGraphExecution

Executes all of the flows in the given graph in topological order.

TriggeredGraphExecution.StreamState

TriggeredGraphExecution.StreamState$

TriggeredGraphExecution.StreamState$.CANCELED$

TriggeredGraphExecution.StreamState$.EXCLUDED$

TriggeredGraphExecution.StreamState$.IDLE$

TriggeredGraphExecution.StreamState$.QUEUED$

TriggeredGraphExecution.StreamState$.RUNNING$

TriggeredGraphExecution.StreamState$.SKIPPED$

TriggeredGraphExecution.StreamState$.SUCCESSFUL$

TriggeredGraphExecution.StreamState$.TERMINATED_WITH_ERROR$

UncaughtExceptionHandler

Uncaught exception handler which first calls the delegate and then calls the OnFailure function with the uncaught exception.

UnexpectedRunFailure

Run could not be associated with a proper root cause.

UnionFlowFilter

Returns a flow filter that is a union of two flow filters

UnresolvedDatasetException

Exception raised when a flow tries to read from a dataset that exists but is unresolved

UnresolvedFlow

A Flow whose output schema and dependencies aren't known.

UnresolvedPipelineException

Exception raised when a pipeline has one or more flows that cannot be resolved

View

Representing a view in the DataflowGraph.

ViewHelpers

VirtualTableInput

A type of TableInput that returns data from a specified schema or from the inferred Flows that write to the table.

Package org.apache.spark.sql.pipelines.graph