pyspark.BarrierTaskContext

class pyspark.BarrierTaskContext[source]

A TaskContext with extra contextual info and tooling for tasks in a barrier stage. Use BarrierTaskContext.get() to obtain the barrier context for a running barrier task.

New in version 2.4.0.

Notes

This API is experimental

Examples

Set a barrier, and execute it with RDD.

>>> from pyspark import BarrierTaskContext
>>> def block_and_do_something(itr):
...     taskcontext = BarrierTaskContext.get()
...     # Do something.
...
...     # Wait until all tasks finished.
...     taskcontext.barrier()
...
...     return itr
...
>>> rdd = spark.sparkContext.parallelize([1])
>>> rdd.barrier().mapPartitions(block_and_do_something).collect()
[1]

Methods

allGather([message])

This function blocks until all tasks in the same stage have reached this routine.

attemptNumber()

How many times this task has been attempted.

barrier()

Sets a global barrier and waits until all tasks in this stage hit this barrier.

cpus()

CPUs allocated to the task.

get()

Return the currently active BarrierTaskContext.

getLocalProperty(key)

Get a local property set upstream in the driver, or None if it is missing.

getTaskInfos()

Returns BarrierTaskInfo for all tasks in this barrier stage, ordered by partition ID.

partitionId()

The ID of the RDD partition that is computed by this task.

resources()

Resources allocated to the task.

stageId()

The ID of the stage that this task belong to.

taskAttemptId()

An ID that is unique to this task attempt (within the same SparkContext, no two task attempts will share the same attempt ID).