Interface ShuffleMapOutputWriter


@Private public interface ShuffleMapOutputWriter
:: Private :: A top-level writer that returns child writers for persisting the output of a map task, and then commits all of the writes as one atomic operation.
Since:
3.0.0
  • Method Details

    • getPartitionWriter

      ShufflePartitionWriter getPartitionWriter(int reducePartitionId) throws IOException
      Creates a writer that can open an output stream to persist bytes targeted for a given reduce partition id.

      The chunk corresponds to bytes in the given reduce partition. This will not be called twice for the same partition within any given map task. The partition identifier will be in the range of precisely 0 (inclusive) to numPartitions (exclusive), where numPartitions was provided upon the creation of this map output writer via ShuffleExecutorComponents.createMapOutputWriter(int, long, int).

      Calls to this method will be invoked with monotonically increasing reducePartitionIds; each call to this method will be called with a reducePartitionId that is strictly greater than the reducePartitionIds given to any previous call to this method. This method is not guaranteed to be called for every partition id in the above described range. In particular, no guarantees are made as to whether or not this method will be called for empty partitions.

      Throws:
      IOException
    • commitAllPartitions

      MapOutputCommitMessage commitAllPartitions(long[] checksums) throws IOException
      Commits the writes done by all partition writers returned by all calls to this object's getPartitionWriter(int), and returns the number of bytes written for each partition.

      This should ensure that the writes conducted by this module's partition writers are available to downstream reduce tasks. If this method throws any exception, this module's abort(Throwable) method will be invoked before propagating the exception.

      Shuffle extensions which care about the cause of shuffle data corruption should store the checksums properly. When corruption happens, Spark would provide the checksum of the fetched partition to the shuffle extension to help diagnose the cause of corruption.

      This can also close any resources and clean up temporary state if necessary.

      The returned commit message is a structure with two components:

      1) An array of longs, which should contain, for each partition from (0) to (numPartitions - 1), the number of bytes written by the partition writer for that partition id.

      2) An optional metadata blob that can be used by shuffle readers.

      Parameters:
      checksums - The checksum values for each partition (where checksum index is equivalent to partition id) if shuffle checksum enabled. Otherwise, it's empty.
      Throws:
      IOException
    • abort

      void abort(Throwable error) throws IOException
      Abort all of the writes done by any writers returned by getPartitionWriter(int).

      This should invalidate the results of writing bytes. This can also close any resources and clean up temporary state if necessary.

      Throws:
      IOException