pyspark.RDD.countByKey¶
- 
RDD.countByKey() → Dict[K, int][source]¶
- Count the number of elements for each key, and return the result to the master as a dictionary. - New in version 0.7.0. - Returns
- dict
- a dictionary of (key, count) pairs 
 
 - See also - Examples - >>> rdd = sc.parallelize([("a", 1), ("b", 1), ("a", 1)]) >>> sorted(rdd.countByKey().items()) [('a', 2), ('b', 1)]