pyspark.RDD.rightOuterJoin

RDD.rightOuterJoin(other: pyspark.rdd.RDD[Tuple[K, U]], numPartitions: Optional[int] = None) → pyspark.rdd.RDD[Tuple[K, Tuple[Optional[V], U]]][source]

Perform a right outer join of self and other.

For each element (k, w) in other, the resulting RDD will either contain all pairs (k, (v, w)) for v in this, or the pair (k, (None, w)) if no elements in self have key k.

Hash-partitions the resulting RDD into the given number of partitions.

New in version 0.7.0.

Parameters
otherRDD

another RDD

numPartitionsint, optional

the number of partitions in new RDD

Returns
RDD

a RDD containing all pairs of elements with matching keys

Examples

>>> rdd1 = sc.parallelize([("a", 1), ("b", 4)])
>>> rdd2 = sc.parallelize([("a", 2)])
>>> sorted(rdd2.rightOuterJoin(rdd1).collect())
[('a', (2, 1)), ('b', (None, 4))]