Interface ShuffleDriverComponents


@Private public interface ShuffleDriverComponents
:: Private :: An interface for building shuffle support modules for the Driver.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    Called once at the end of the Spark application to clean up any existing shuffle state.
    Called once in the driver to bootstrap this module that is specific to this application.
    default void
    registerShuffle(int shuffleId)
    Called once per shuffle id when the shuffle id is first generated for a shuffle stage.
    default void
    removeShuffle(int shuffleId, boolean blocking)
    Removes shuffle data associated with the given shuffle.
    default boolean
    Does this shuffle component support reliable storage - external to the lifecycle of the executor host ?
  • Method Details

    • initializeApplication

      Map<String,String> initializeApplication()
      Called once in the driver to bootstrap this module that is specific to this application. This method is called before submitting executor requests to the cluster manager. This method should prepare the module with its shuffle components i.e. registering against an external file servers or shuffle services, or creating tables in a shuffle storage data database.
      Returns:
      additional SparkConf settings necessary for initializing the executor components. This would include configurations that cannot be statically set on the application, like the host:port of external services for shuffle storage.
    • cleanupApplication

      void cleanupApplication()
      Called once at the end of the Spark application to clean up any existing shuffle state.
    • registerShuffle

      default void registerShuffle(int shuffleId)
      Called once per shuffle id when the shuffle id is first generated for a shuffle stage.
      Parameters:
      shuffleId - The unique identifier for the shuffle stage.
    • removeShuffle

      default void removeShuffle(int shuffleId, boolean blocking)
      Removes shuffle data associated with the given shuffle.
      Parameters:
      shuffleId - The unique identifier for the shuffle stage.
      blocking - Whether this call should block on the deletion of the data.
    • supportsReliableStorage

      default boolean supportsReliableStorage()
      Does this shuffle component support reliable storage - external to the lifecycle of the executor host ? For example, writing shuffle data to a distributed filesystem or persisting it in a remote shuffle service.