pyspark.RDD.filter¶
-
RDD.
filter
(f: Callable[[T], bool]) → pyspark.rdd.RDD[T][source]¶ Return a new RDD containing only the elements that satisfy a predicate.
New in version 0.7.0.
- Parameters
- ffunction
a function to run on each element of the RDD
- Returns
See also
Examples
>>> rdd = sc.parallelize([1, 2, 3, 4, 5]) >>> rdd.filter(lambda x: x % 2 == 0).collect() [2, 4]