Interface GraphOperations
- All Known Implementing Classes:
- DataflowGraph
public interface GraphOperations
- 
Method SummaryModifier and TypeMethodDescriptionscala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>dfsInternal(org.apache.spark.sql.catalyst.TableIdentifier startDestination, boolean downstream, boolean stopAtMaterializationPoints) Performs a DFS starting fromstartNodeand returns the set of nodes (datasets) reached.scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>downstreamFlows(org.apache.spark.sql.catalyst.TableIdentifier flowIdentifier) Returns the set of flows reachable from `flowIdentifier` via output (child) edges.scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,FlowNode> A map from flow identifier to `FlowNode`, which contains the input/output nodes.scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>reachabilitySet(org.apache.spark.sql.catalyst.TableIdentifier destinationIdentifier, boolean downstream) Returns all datasets that can be reached fromdestinationIdentifier.scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>> reachabilitySet(scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.TableIdentifier> datasetIdentifiers, boolean downstream) An implementation of DFS that takes in a sequence of start nodes and returns the "reachability set" of nodes from the start nodes.scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>upstreamDatasets(org.apache.spark.sql.catalyst.TableIdentifier datasetIdentifier) Returns the set of datasets reachable from `datasetIdentifier` via input (parent) edges.scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>> upstreamDatasets(scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.TableIdentifier> datasetIdentifiers) Traverses the graph upstream starting from the specifieddatasetIdentifiersto return the reachable nodes.scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>upstreamFlows(org.apache.spark.sql.catalyst.TableIdentifier flowIdentifier) Returns the set of flows reachable from `flowIdentifier` via input (parent) edges.
- 
Method Details- 
dfsInternalscala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier> dfsInternal(org.apache.spark.sql.catalyst.TableIdentifier startDestination, boolean downstream, boolean stopAtMaterializationPoints) Performs a DFS starting fromstartNodeand returns the set of nodes (datasets) reached.- Parameters:
- startDestination- The identifier of the node to start from.
- downstream- if true, traverse output edges (search downstream) if false, traverse input edges (search upstream).
- stopAtMaterializationPoints- If true, stop when we reach a materialization point (table). If false, keep going until the end.
- Returns:
- (undocumented)
 
- 
downstreamFlowsscala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier> downstreamFlows(org.apache.spark.sql.catalyst.TableIdentifier flowIdentifier) Returns the set of flows reachable from `flowIdentifier` via output (child) edges.
- 
flowNodesscala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,FlowNode> flowNodes()A map from flow identifier to `FlowNode`, which contains the input/output nodes.
- 
reachabilitySetscala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>> reachabilitySet(scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.TableIdentifier> datasetIdentifiers, boolean downstream) An implementation of DFS that takes in a sequence of start nodes and returns the "reachability set" of nodes from the start nodes.- Parameters:
- downstream- Walks the graph via the input edges if true, otherwise via the output edges.
- datasetIdentifiers- (undocumented)
- Returns:
- A map from visited nodes to its origin[s] in datasetIdentifiers, e.g. Let graph = a -> b c -> d (partitioned graph)reachabilitySet(Seq("a", "c"), downstream = true) -> ["a" -> ["a"], "b" -> ["a"], "c" -> ["c"], "d" -> ["c"} 
 
- 
reachabilitySetscala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier> reachabilitySet(org.apache.spark.sql.catalyst.TableIdentifier destinationIdentifier, boolean downstream) Returns all datasets that can be reached fromdestinationIdentifier.- Parameters:
- destinationIdentifier- (undocumented)
- downstream- (undocumented)
- Returns:
- (undocumented)
 
- 
upstreamDatasetsscala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier> upstreamDatasets(org.apache.spark.sql.catalyst.TableIdentifier datasetIdentifier) Returns the set of datasets reachable from `datasetIdentifier` via input (parent) edges.
- 
upstreamDatasetsscala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>> upstreamDatasets(scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.TableIdentifier> datasetIdentifiers) Traverses the graph upstream starting from the specifieddatasetIdentifiersto return the reachable nodes. The return map's keyset consists of all datasets reachable fromdatasetIdentifiers. For each entry in the response map, the value of that element refers to which ofdatasetIdentifierswas able to reach the key. If multiple ofdatasetIdentifierscould reach that key, one is picked arbitrarily.- Parameters:
- datasetIdentifiers- (undocumented)
- Returns:
- (undocumented)
 
- 
upstreamFlowsscala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier> upstreamFlows(org.apache.spark.sql.catalyst.TableIdentifier flowIdentifier) Returns the set of flows reachable from `flowIdentifier` via input (parent) edges.
 
-