pyspark.RDD.treeAggregate

RDD.treeAggregate(zeroValue: U, seqOp: Callable[[U, T], U], combOp: Callable[[U, U], U], depth: int = 2) → U[source]

Aggregates the elements of this RDD in a multi-level tree pattern.

New in version 1.3.0.

Parameters
zeroValueU

the initial value for the accumulated result of each partition

seqOpfunction

a function used to accumulate results within a partition

combOpfunction

an associative function used to combine results from different partitions

depthint, optional, default 2

suggested depth of the tree

Returns
U

the aggregated result

Examples

>>> add = lambda x, y: x + y
>>> rdd = sc.parallelize([-5, -4, -3, -2, -1, 1, 2, 3, 4], 10)
>>> rdd.treeAggregate(0, add, add)
-5
>>> rdd.treeAggregate(0, add, add, 1)
-5
>>> rdd.treeAggregate(0, add, add, 2)
-5
>>> rdd.treeAggregate(0, add, add, 5)
-5
>>> rdd.treeAggregate(0, add, add, 10)
-5