pyspark.RDD.flatMapValues

RDD.flatMapValues(f: Callable[[V], Iterable[U]]) → pyspark.rdd.RDD[Tuple[K, U]][source]

Pass each value in the key-value pair RDD through a flatMap function without changing the keys; this also retains the original RDD’s partitioning.

New in version 0.7.0.

Parameters
ffunction

a function to turn a V into a sequence of U

Returns
RDD

a RDD containing the keys and the flat-mapped value

Examples

>>> rdd = sc.parallelize([("a", ["x", "y", "z"]), ("b", ["p", "r"])])
>>> def f(x): return x
...
>>> rdd.flatMapValues(f).collect()
[('a', 'x'), ('a', 'y'), ('a', 'z'), ('b', 'p'), ('b', 'r')]