pyspark.sql.functions.regexp_substr

pyspark.sql.functions.regexp_substr(str: ColumnOrName, regexp: ColumnOrName) → pyspark.sql.column.Column[source]

Returns the substring that matches the Java regex regexp within the string str. If the regular expression is not found, the result is null.

New in version 3.5.0.

Parameters
strColumn or str

target column to work on.

regexpColumn or str

regex pattern to apply.

Returns
Column

the substring that matches a Java regex within the string str.

Examples

>>> df = spark.createDataFrame([("1a 2b 14m", r"\d+")], ["str", "regexp"])
>>> df.select(regexp_substr('str', lit(r'\d+')).alias('d')).collect()
[Row(d='1')]
>>> df.select(regexp_substr('str', lit(r'mmm')).alias('d')).collect()
[Row(d=None)]
>>> df.select(regexp_substr("str", col("regexp")).alias('d')).collect()
[Row(d='1')]