pyspark.Broadcast

class pyspark.Broadcast(sc: Optional[SparkContext] = None, value: Optional[T] = None, pickle_registry: Optional[BroadcastPickleRegistry] = None, path: Optional[str] = None, sock_file: Optional[BinaryIO] = None)[source]

A broadcast variable created with SparkContext.broadcast(). Access its value through value.

Examples

>>> b = spark.sparkContext.broadcast([1, 2, 3, 4, 5])
>>> b.value
[1, 2, 3, 4, 5]
>>> spark.sparkContext.parallelize([0, 0]).flatMap(lambda x: b.value).collect()
[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
>>> b.unpersist()
>>> large_broadcast = spark.sparkContext.broadcast(range(10000))

Methods

destroy([blocking])

Destroy all data and metadata related to this broadcast variable.

dump(value, f)

Write a pickled representation of value to the open file or socket.

load(file)

Read a pickled representation of value from the open file or socket.

load_from_path(path)

Read the pickled representation of an object from the open file and return the reconstituted object hierarchy specified therein.

unpersist([blocking])

Delete cached copies of this broadcast on the executors.

Attributes

value

Return the broadcasted value