pyspark.RDD.countByValue

RDD.countByValue() → Dict[K, int][source]

Return the count of each unique value in this RDD as a dictionary of (value, count) pairs.

New in version 0.7.0.

Returns
dict

a dictionary of (value, count) pairs

Examples

>>> sorted(sc.parallelize([1, 2, 1, 2, 2], 2).countByValue().items())
[(1, 2), (2, 3)]