pyspark.sql.functions.array_except

pyspark.sql.functions.array_except(col1: ColumnOrName, col2: ColumnOrName) → pyspark.sql.column.Column[source]

Collection function: returns an array of the elements in col1 but not in col2, without duplicates.

New in version 2.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
col1Column or str

name of column containing array

col2Column or str

name of column containing array

Returns
Column

an array of values from first array that are not in the second.

Examples

>>> from pyspark.sql import Row
>>> df = spark.createDataFrame([Row(c1=["b", "a", "c"], c2=["c", "d", "a", "f"])])
>>> df.select(array_except(df.c1, df.c2)).collect()
[Row(array_except(c1, c2)=['b'])]