pyspark.sql.DataFrame.filter

DataFrame.filter(condition: ColumnOrName) → DataFrame[source]

Filters rows using the given condition.

where() is an alias for filter().

New in version 1.3.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
conditionColumn or str

a Column of types.BooleanType or a string of SQL expressions.

Returns
DataFrame

Filtered DataFrame.

Examples

>>> df = spark.createDataFrame([
...     (2, "Alice"), (5, "Bob")], schema=["age", "name"])

Filter by Column instances.

>>> df.filter(df.age > 3).show()
+---+----+
|age|name|
+---+----+
|  5| Bob|
+---+----+
>>> df.where(df.age == 2).show()
+---+-----+
|age| name|
+---+-----+
|  2|Alice|
+---+-----+

Filter by SQL expression in a string.

>>> df.filter("age > 3").show()
+---+----+
|age|name|
+---+----+
|  5| Bob|
+---+----+
>>> df.where("age = 2").show()
+---+-----+
|age| name|
+---+-----+
|  2|Alice|
+---+-----+