Class ShuffleDependency<K,V,C>

Object
org.apache.spark.Dependency<scala.Product2<K,V>>
org.apache.spark.ShuffleDependency<K,V,C>
All Implemented Interfaces:
Serializable, org.apache.spark.internal.Logging, scala.Serializable

public class ShuffleDependency<K,V,C> extends Dependency<scala.Product2<K,V>> implements org.apache.spark.internal.Logging
:: DeveloperApi :: Represents a dependency on the output of a shuffle stage. Note that in the case of shuffle, the RDD is transient since we don't need it on the executor side.

param: _rdd the parent RDD param: partitioner partitioner used to partition the shuffle output param: serializer Serializer to use. If not set explicitly then the default serializer, as specified by spark.serializer config option, will be used. param: keyOrdering key ordering for RDD's shuffles param: aggregator map/reduce-side aggregator for RDD's shuffle param: mapSideCombine whether to perform partial aggregation (also known as map-side combine) param: shuffleWriterProcessor the processor to control the write behavior in ShuffleMapTask

See Also:
  • Nested Class Summary

    Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging

    org.apache.spark.internal.Logging.SparkShellLoggingFilter
  • Constructor Summary

    Constructors
    Constructor
    Description
    ShuffleDependency(RDD<? extends scala.Product2<K,V>> _rdd, Partitioner partitioner, Serializer serializer, scala.Option<scala.math.Ordering<K>> keyOrdering, scala.Option<Aggregator<K,V,C>> aggregator, boolean mapSideCombine, org.apache.spark.shuffle.ShuffleWriteProcessor shuffleWriterProcessor, scala.reflect.ClassTag<K> evidence$1, scala.reflect.ClassTag<V> evidence$2, scala.reflect.ClassTag<C> evidence$3)
     
  • Method Summary

    Modifier and Type
    Method
    Description
    scala.Option<Aggregator<K,V,C>>
     
    scala.collection.Seq<BlockManagerId>
     
    scala.Option<scala.math.Ordering<K>>
     
    boolean
     
    void
     
     
    RDD<scala.Product2<K,V>>
    rdd()
     
     
    void
    setMergerLocs(scala.collection.Seq<BlockManagerId> mergerLocs)
     
    org.apache.spark.shuffle.ShuffleHandle
     
    int
     
    boolean
     
    boolean
     
    boolean
    Returns true if push-based shuffle is disabled or if the shuffle merge for this shuffle is finalized.
    int
    shuffleMergeId is used to uniquely identify merging process of shuffle by an indeterminate stage attempt.
    org.apache.spark.shuffle.ShuffleWriteProcessor
     

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface org.apache.spark.internal.Logging

    initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq
  • Constructor Details

    • ShuffleDependency

      public ShuffleDependency(RDD<? extends scala.Product2<K,V>> _rdd, Partitioner partitioner, Serializer serializer, scala.Option<scala.math.Ordering<K>> keyOrdering, scala.Option<Aggregator<K,V,C>> aggregator, boolean mapSideCombine, org.apache.spark.shuffle.ShuffleWriteProcessor shuffleWriterProcessor, scala.reflect.ClassTag<K> evidence$1, scala.reflect.ClassTag<V> evidence$2, scala.reflect.ClassTag<C> evidence$3)
  • Method Details

    • partitioner

      public Partitioner partitioner()
    • serializer

      public Serializer serializer()
    • keyOrdering

      public scala.Option<scala.math.Ordering<K>> keyOrdering()
    • aggregator

      public scala.Option<Aggregator<K,V,C>> aggregator()
    • mapSideCombine

      public boolean mapSideCombine()
    • shuffleWriterProcessor

      public org.apache.spark.shuffle.ShuffleWriteProcessor shuffleWriterProcessor()
    • rdd

      public RDD<scala.Product2<K,V>> rdd()
      Specified by:
      rdd in class Dependency<scala.Product2<K,V>>
    • shuffleId

      public int shuffleId()
    • shuffleHandle

      public org.apache.spark.shuffle.ShuffleHandle shuffleHandle()
    • shuffleMergeEnabled

      public boolean shuffleMergeEnabled()
    • shuffleMergeAllowed

      public boolean shuffleMergeAllowed()
    • shuffleMergeId

      public int shuffleMergeId()
      shuffleMergeId is used to uniquely identify merging process of shuffle by an indeterminate stage attempt.
      Returns:
      (undocumented)
    • setMergerLocs

      public void setMergerLocs(scala.collection.Seq<BlockManagerId> mergerLocs)
    • getMergerLocs

      public scala.collection.Seq<BlockManagerId> getMergerLocs()
    • shuffleMergeFinalized

      public boolean shuffleMergeFinalized()
      Returns true if push-based shuffle is disabled or if the shuffle merge for this shuffle is finalized.
      Returns:
      (undocumented)
    • newShuffleMergeState

      public void newShuffleMergeState()