pyspark.sql.functions.split(str: ColumnOrName, pattern: str, limit: int = - 1) → pyspark.sql.column.Column[source]

Splits str around matches of the given pattern.

New in version 1.5.0.

Changed in version 3.4.0: Supports Spark Connect.

strColumn or str

a string expression to split


a string representing a regular expression. The regex string should be a Java regular expression.

limitint, optional

an integer which controls the number of times pattern is applied.

  • limit > 0: The resulting array’s length will not be more than limit, and the

    resulting array’s last entry will contain all input beyond the last matched pattern.

  • limit <= 0: pattern will be applied as many times as possible, and the resulting

    array can be of any size.

Changed in version 3.0: split now takes an optional limit field. If not provided, default limit value is -1.


array of separated strings.


>>> df = spark.createDataFrame([('oneAtwoBthreeC',)], ['s',])
>>>, '[ABC]', 2).alias('s')).collect()
[Row(s=['one', 'twoBthreeC'])]
>>>, '[ABC]', -1).alias('s')).collect()
[Row(s=['one', 'two', 'three', ''])]