pyspark.RDD.glom

RDD.glom() → pyspark.rdd.RDD[List[T]][source]

Return an RDD created by coalescing all elements within each partition into a list.

New in version 0.7.0.

Returns
RDD

a new RDD coalescing all elements within each partition into a list

Examples

>>> rdd = sc.parallelize([1, 2, 3, 4], 2)
>>> sorted(rdd.glom().collect())
[[1, 2], [3, 4]]