pyspark.RDD.filter

RDD.filter(f: Callable[[T], bool]) → pyspark.rdd.RDD[T][source]

Return a new RDD containing only the elements that satisfy a predicate.

New in version 0.7.0.

Parameters
ffunction

a function to run on each element of the RDD

Returns
RDD

a new RDD by applying a function to each element

See also

RDD.map()

Examples

>>> rdd = sc.parallelize([1, 2, 3, 4, 5])
>>> rdd.filter(lambda x: x % 2 == 0).collect()
[2, 4]