object functions
Commonly used functions available for DataFrame operations. Using functions defined here provides a little bit more compile-time safety to make sure the function exists.
Spark also includes more built-in functions that are less common and are not defined here.
You can still access them (and all the functions defined here) using the functions.expr() API
and calling them through a SQL expression string. You can find the entire list of functions
at SQL API documentation of your Spark version, see also
the latest list
As an example, isnan is a function that is defined here. You can use isnan(col("myCol"))
to invoke the isnan function. This way the programming language's compiler ensures isnan
exists and is of the proper form. You can also use expr("isnan(myCol)") function to invoke the
same function. In this case, Spark itself will ensure isnan exists when it analyzes the query.
regr_count is an example of a function that is built-in but not defined here, because it is
less commonly used. To invoke it, use expr("regr_count(yCol, xCol)").
This function APIs usually have methods with Column signature only because it can support not
only Column but also other types such as a native string. The other variants currently exist
for historical reasons.
- Annotations
 - @Stable()
 - Source
 - functions.scala
 - Since
 1.3.0
- Grouped
 - Alphabetic
 - By Inheritance
 
- functions
 - AnyRef
 - Any
 
- Hide All
 - Show All
 
- Public
 - All
 
Value Members
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        !=(arg0: Any): Boolean
      
      
      
- Definition Classes
 - AnyRef → Any
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        ##(): Int
      
      
      
- Definition Classes
 - AnyRef → Any
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        ==(arg0: Any): Boolean
      
      
      
- Definition Classes
 - AnyRef → Any
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        abs(e: Column): Column
      
      
      
Computes the absolute value of a numeric value.
Computes the absolute value of a numeric value.
- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        acos(columnName: String): Column
      
      
      
- returns
 inverse cosine of
columnName, as if computed byjava.lang.Math.acos
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        acos(e: Column): Column
      
      
      
- returns
 inverse cosine of
ein radians, as if computed byjava.lang.Math.acos
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        acosh(columnName: String): Column
      
      
      
- returns
 inverse hyperbolic cosine of
columnName
- Since
 3.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        acosh(e: Column): Column
      
      
      
- returns
 inverse hyperbolic cosine of
e
- Since
 3.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        add_months(startDate: Column, numMonths: Column): Column
      
      
      
Returns the date that is
numMonthsafterstartDate.Returns the date that is
numMonthsafterstartDate.- startDate
 A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as
yyyy-MM-ddoryyyy-MM-dd HH:mm:ss.SSSS- numMonths
 A column of the number of months to add to
startDate, can be negative to subtract months- returns
 A date, or null if
startDatewas a string that could not be cast to a date
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        add_months(startDate: Column, numMonths: Int): Column
      
      
      
Returns the date that is
numMonthsafterstartDate.Returns the date that is
numMonthsafterstartDate.- startDate
 A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as
yyyy-MM-ddoryyyy-MM-dd HH:mm:ss.SSSS- numMonths
 The number of months to add to
startDate, can be negative to subtract months- returns
 A date, or null if
startDatewas a string that could not be cast to a date
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        aes_decrypt(input: Column, key: Column): Column
      
      
      
Returns a decrypted value of
input.Returns a decrypted value of
input.- Since
 3.5.0
- See also
 org.apache.spark.sql.functions.aes_decrypt(Column, Column, Column, Column, Column)
 - 
      
      
      
        
      
    
      
        
        def
      
      
        aes_decrypt(input: Column, key: Column, mode: Column): Column
      
      
      
Returns a decrypted value of
input.Returns a decrypted value of
input.- Since
 3.5.0
- See also
 org.apache.spark.sql.functions.aes_decrypt(Column, Column, Column, Column, Column)
 - 
      
      
      
        
      
    
      
        
        def
      
      
        aes_decrypt(input: Column, key: Column, mode: Column, padding: Column): Column
      
      
      
Returns a decrypted value of
input.Returns a decrypted value of
input.- Since
 3.5.0
- See also
 org.apache.spark.sql.functions.aes_decrypt(Column, Column, Column, Column, Column)
 - 
      
      
      
        
      
    
      
        
        def
      
      
        aes_decrypt(input: Column, key: Column, mode: Column, padding: Column, aad: Column): Column
      
      
      
Returns a decrypted value of
inputusing AES inmodewithpadding.Returns a decrypted value of
inputusing AES inmodewithpadding. Key lengths of 16, 24 and 32 bits are supported. Supported combinations of (mode,padding) are ('ECB', 'PKCS'), ('GCM', 'NONE') and ('CBC', 'PKCS'). Optional additional authenticated data (AAD) is only supported for GCM. If provided for encryption, the identical AAD value must be provided for decryption. The default mode is GCM.- input
 The binary value to decrypt.
- key
 The passphrase to use to decrypt the data.
- mode
 Specifies which block cipher mode should be used to decrypt messages. Valid modes: ECB, GCM, CBC.
- padding
 Specifies how to pad messages whose length is not a multiple of the block size. Valid values: PKCS, NONE, DEFAULT. The DEFAULT padding means PKCS for ECB, NONE for GCM and PKCS for CBC.
- aad
 Optional additional authenticated data. Only supported for GCM mode. This can be any free-form input and must be provided for both encryption and decryption.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        aes_encrypt(input: Column, key: Column): Column
      
      
      
Returns an encrypted value of
input.Returns an encrypted value of
input.- Since
 3.5.0
- See also
 org.apache.spark.sql.functions.aes_encrypt(Column, Column, Column, Column, Column, Column)
 - 
      
      
      
        
      
    
      
        
        def
      
      
        aes_encrypt(input: Column, key: Column, mode: Column): Column
      
      
      
Returns an encrypted value of
input.Returns an encrypted value of
input.- Since
 3.5.0
- See also
 org.apache.spark.sql.functions.aes_encrypt(Column, Column, Column, Column, Column, Column)
 - 
      
      
      
        
      
    
      
        
        def
      
      
        aes_encrypt(input: Column, key: Column, mode: Column, padding: Column): Column
      
      
      
Returns an encrypted value of
input.Returns an encrypted value of
input.- Since
 3.5.0
- See also
 org.apache.spark.sql.functions.aes_encrypt(Column, Column, Column, Column, Column, Column)
 - 
      
      
      
        
      
    
      
        
        def
      
      
        aes_encrypt(input: Column, key: Column, mode: Column, padding: Column, iv: Column): Column
      
      
      
Returns an encrypted value of
input.Returns an encrypted value of
input.- Since
 3.5.0
- See also
 org.apache.spark.sql.functions.aes_encrypt(Column, Column, Column, Column, Column, Column)
 - 
      
      
      
        
      
    
      
        
        def
      
      
        aes_encrypt(input: Column, key: Column, mode: Column, padding: Column, iv: Column, aad: Column): Column
      
      
      
Returns an encrypted value of
inputusing AES in givenmodewith the specifiedpadding.Returns an encrypted value of
inputusing AES in givenmodewith the specifiedpadding. Key lengths of 16, 24 and 32 bits are supported. Supported combinations of (mode,padding) are ('ECB', 'PKCS'), ('GCM', 'NONE') and ('CBC', 'PKCS'). Optional initialization vectors (IVs) are only supported for CBC and GCM modes. These must be 16 bytes for CBC and 12 bytes for GCM. If not provided, a random vector will be generated and prepended to the output. Optional additional authenticated data (AAD) is only supported for GCM. If provided for encryption, the identical AAD value must be provided for decryption. The default mode is GCM.- input
 The binary value to encrypt.
- key
 The passphrase to use to encrypt the data.
- mode
 Specifies which block cipher mode should be used to encrypt messages. Valid modes: ECB, GCM, CBC.
- padding
 Specifies how to pad messages whose length is not a multiple of the block size. Valid values: PKCS, NONE, DEFAULT. The DEFAULT padding means PKCS for ECB, NONE for GCM and PKCS for CBC.
- iv
 Optional initialization vector. Only supported for CBC and GCM modes. Valid values: None or "". 16-byte array for CBC mode. 12-byte array for GCM mode.
- aad
 Optional additional authenticated data. Only supported for GCM mode. This can be any free-form input and must be provided for both encryption and decryption.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        aggregate(expr: Column, initialValue: Column, merge: (Column, Column) ⇒ Column): Column
      
      
      
Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state.
Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state.
df.select(aggregate(col("i"), lit(0), (acc, x) => acc + x))
- expr
 the input array column
- initialValue
 the initial value
- merge
 (combined_value, input_value) => combined_value, the merge function to merge an input value to the combined_value
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        aggregate(expr: Column, initialValue: Column, merge: (Column, Column) ⇒ Column, finish: (Column) ⇒ Column): Column
      
      
      
Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state.
Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function.
df.select(aggregate(col("i"), lit(0), (acc, x) => acc + x, _ * 10))
- expr
 the input array column
- initialValue
 the initial value
- merge
 (combined_value, input_value) => combined_value, the merge function to merge an input value to the combined_value
- finish
 combined_value => final_value, the lambda function to convert the combined value of all inputs to final result
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        any(e: Column): Column
      
      
      
Aggregate function: returns true if at least one value of
eis true.Aggregate function: returns true if at least one value of
eis true.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        any_value(e: Column, ignoreNulls: Column): Column
      
      
      
Aggregate function: returns some value of
efor a group of rows.Aggregate function: returns some value of
efor a group of rows. IfisIgnoreNullis true, returns only non-null values.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        any_value(e: Column): Column
      
      
      
Aggregate function: returns some value of
efor a group of rows.Aggregate function: returns some value of
efor a group of rows.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        approx_count_distinct(columnName: String, rsd: Double): Column
      
      
      
Aggregate function: returns the approximate number of distinct items in a group.
Aggregate function: returns the approximate number of distinct items in a group.
- rsd
 maximum relative standard deviation allowed (default = 0.05)
- Since
 2.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        approx_count_distinct(e: Column, rsd: Double): Column
      
      
      
Aggregate function: returns the approximate number of distinct items in a group.
Aggregate function: returns the approximate number of distinct items in a group.
- rsd
 maximum relative standard deviation allowed (default = 0.05)
- Since
 2.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        approx_count_distinct(columnName: String): Column
      
      
      
Aggregate function: returns the approximate number of distinct items in a group.
Aggregate function: returns the approximate number of distinct items in a group.
- Since
 2.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        approx_count_distinct(e: Column): Column
      
      
      
Aggregate function: returns the approximate number of distinct items in a group.
Aggregate function: returns the approximate number of distinct items in a group.
- Since
 2.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        approx_percentile(e: Column, percentage: Column, accuracy: Column): Column
      
      
      
Aggregate function: returns the approximate
percentileof the numeric columncolwhich is the smallest value in the orderedcolvalues (sorted from least to greatest) such that no more thanpercentageofcolvalues is less than the value or equal to that value.Aggregate function: returns the approximate
percentileof the numeric columncolwhich is the smallest value in the orderedcolvalues (sorted from least to greatest) such that no more thanpercentageofcolvalues is less than the value or equal to that value.If percentage is an array, each value must be between 0.0 and 1.0. If it is a single floating point value, it must be between 0.0 and 1.0.
The accuracy parameter is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of accuracy yields better accuracy, 1.0/accuracy is the relative error of the approximation.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        array(colName: String, colNames: String*): Column
      
      
      
Creates a new array column.
Creates a new array column. The input columns must all have the same data type.
- Annotations
 - @varargs()
 - Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        array(cols: Column*): Column
      
      
      
Creates a new array column.
Creates a new array column. The input columns must all have the same data type.
- Annotations
 - @varargs()
 - Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        array_agg(e: Column): Column
      
      
      
Aggregate function: returns a list of objects with duplicates.
Aggregate function: returns a list of objects with duplicates.
- Since
 3.5.0
- Note
 The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        array_append(column: Column, element: Any): Column
      
      
      
Returns an ARRAY containing all elements from the source ARRAY as well as the new element.
Returns an ARRAY containing all elements from the source ARRAY as well as the new element. The new element/column is located at end of the ARRAY.
- Since
 3.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        array_compact(column: Column): Column
      
      
      
Remove all null elements from the given array.
Remove all null elements from the given array.
- Since
 3.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        array_contains(column: Column, value: Any): Column
      
      
      
Returns null if the array is null, true if the array contains
value, and false otherwise.Returns null if the array is null, true if the array contains
value, and false otherwise.- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        array_distinct(e: Column): Column
      
      
      
Removes duplicate values from the array.
Removes duplicate values from the array.
- Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        array_except(col1: Column, col2: Column): Column
      
      
      
Returns an array of the elements in the first array but not in the second array, without duplicates.
Returns an array of the elements in the first array but not in the second array, without duplicates. The order of elements in the result is not determined
- Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        array_insert(arr: Column, pos: Column, value: Column): Column
      
      
      
Adds an item into a given array at a specified position
Adds an item into a given array at a specified position
- Since
 3.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        array_intersect(col1: Column, col2: Column): Column
      
      
      
Returns an array of the elements in the intersection of the given two arrays, without duplicates.
Returns an array of the elements in the intersection of the given two arrays, without duplicates.
- Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        array_join(column: Column, delimiter: String): Column
      
      
      
Concatenates the elements of
columnusing thedelimiter.Concatenates the elements of
columnusing thedelimiter.- Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        array_join(column: Column, delimiter: String, nullReplacement: String): Column
      
      
      
Concatenates the elements of
columnusing thedelimiter.Concatenates the elements of
columnusing thedelimiter. Null values are replaced withnullReplacement.- Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        array_max(e: Column): Column
      
      
      
Returns the maximum value in the array.
Returns the maximum value in the array. NaN is greater than any non-NaN elements for double/float type. NULL elements are skipped.
- Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        array_min(e: Column): Column
      
      
      
Returns the minimum value in the array.
Returns the minimum value in the array. NaN is greater than any non-NaN elements for double/float type. NULL elements are skipped.
- Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        array_position(column: Column, value: Any): Column
      
      
      
Locates the position of the first occurrence of the value in the given array as long.
Locates the position of the first occurrence of the value in the given array as long. Returns null if either of the arguments are null.
- Since
 2.4.0
- Note
 The position is not zero based, but 1 based index. Returns 0 if value could not be found in array.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        array_prepend(column: Column, element: Any): Column
      
      
      
Returns an array containing value as well as all elements from array.
Returns an array containing value as well as all elements from array. The new element is positioned at the beginning of the array.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        array_remove(column: Column, element: Any): Column
      
      
      
Remove all elements that equal to element from the given array.
Remove all elements that equal to element from the given array.
- Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        array_repeat(e: Column, count: Int): Column
      
      
      
Creates an array containing the left argument repeated the number of times given by the right argument.
Creates an array containing the left argument repeated the number of times given by the right argument.
- Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        array_repeat(left: Column, right: Column): Column
      
      
      
Creates an array containing the left argument repeated the number of times given by the right argument.
Creates an array containing the left argument repeated the number of times given by the right argument.
- Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        array_size(e: Column): Column
      
      
      
Returns the total number of elements in the array.
Returns the total number of elements in the array. The function returns null for null input.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        array_sort(e: Column, comparator: (Column, Column) ⇒ Column): Column
      
      
      
Sorts the input array based on the given comparator function.
Sorts the input array based on the given comparator function. The comparator will take two arguments representing two elements of the array. It returns a negative integer, 0, or a positive integer as the first element is less than, equal to, or greater than the second element. If the comparator function returns null, the function will fail and raise an error.
- Since
 3.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        array_sort(e: Column): Column
      
      
      
Sorts the input array in ascending order.
Sorts the input array in ascending order. The elements of the input array must be orderable. NaN is greater than any non-NaN elements for double/float type. Null elements will be placed at the end of the returned array.
- Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        array_union(col1: Column, col2: Column): Column
      
      
      
Returns an array of the elements in the union of the given two arrays, without duplicates.
Returns an array of the elements in the union of the given two arrays, without duplicates.
- Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        arrays_overlap(a1: Column, a2: Column): Column
      
      
      
Returns
trueifa1anda2have at least one non-null element in common.Returns
trueifa1anda2have at least one non-null element in common. If not and both the arrays are non-empty and any of them contains anull, it returnsnull. It returnsfalseotherwise.- Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        arrays_zip(e: Column*): Column
      
      
      
Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.
Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.
- Annotations
 - @varargs()
 - Since
 2.4.0
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        asInstanceOf[T0]: T0
      
      
      
- Definition Classes
 - Any
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        asc(columnName: String): Column
      
      
      
Returns a sort expression based on ascending order of the column.
Returns a sort expression based on ascending order of the column.
df.sort(asc("dept"), desc("age"))
- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        asc_nulls_first(columnName: String): Column
      
      
      
Returns a sort expression based on ascending order of the column, and null values return before non-null values.
Returns a sort expression based on ascending order of the column, and null values return before non-null values.
df.sort(asc_nulls_first("dept"), desc("age"))
- Since
 2.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        asc_nulls_last(columnName: String): Column
      
      
      
Returns a sort expression based on ascending order of the column, and null values appear after non-null values.
Returns a sort expression based on ascending order of the column, and null values appear after non-null values.
df.sort(asc_nulls_last("dept"), desc("age"))
- Since
 2.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        ascii(e: Column): Column
      
      
      
Computes the numeric value of the first character of the string column, and returns the result as an int column.
Computes the numeric value of the first character of the string column, and returns the result as an int column.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        asin(columnName: String): Column
      
      
      
- returns
 inverse sine of
columnName, as if computed byjava.lang.Math.asin
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        asin(e: Column): Column
      
      
      
- returns
 inverse sine of
ein radians, as if computed byjava.lang.Math.asin
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        asinh(columnName: String): Column
      
      
      
- returns
 inverse hyperbolic sine of
columnName
- Since
 3.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        asinh(e: Column): Column
      
      
      
- returns
 inverse hyperbolic sine of
e
- Since
 3.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        assert_true(c: Column, e: Column): Column
      
      
      
Returns null if the condition is true; throws an exception with the error message otherwise.
Returns null if the condition is true; throws an exception with the error message otherwise.
- Since
 3.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        assert_true(c: Column): Column
      
      
      
Returns null if the condition is true, and throws an exception otherwise.
Returns null if the condition is true, and throws an exception otherwise.
- Since
 3.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        atan(columnName: String): Column
      
      
      
- returns
 inverse tangent of
columnName, as if computed byjava.lang.Math.atan
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        atan(e: Column): Column
      
      
      
- returns
 inverse tangent of
eas if computed byjava.lang.Math.atan
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        atan2(yValue: Double, xName: String): Column
      
      
      
- yValue
 coordinate on y-axis
- xName
 coordinate on x-axis
- returns
 the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by
java.lang.Math.atan2
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        atan2(yValue: Double, x: Column): Column
      
      
      
- yValue
 coordinate on y-axis
- x
 coordinate on x-axis
- returns
 the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by
java.lang.Math.atan2
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        atan2(yName: String, xValue: Double): Column
      
      
      
- yName
 coordinate on y-axis
- xValue
 coordinate on x-axis
- returns
 the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by
java.lang.Math.atan2
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        atan2(y: Column, xValue: Double): Column
      
      
      
- y
 coordinate on y-axis
- xValue
 coordinate on x-axis
- returns
 the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by
java.lang.Math.atan2
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        atan2(yName: String, xName: String): Column
      
      
      
- yName
 coordinate on y-axis
- xName
 coordinate on x-axis
- returns
 the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by
java.lang.Math.atan2
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        atan2(yName: String, x: Column): Column
      
      
      
- yName
 coordinate on y-axis
- x
 coordinate on x-axis
- returns
 the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by
java.lang.Math.atan2
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        atan2(y: Column, xName: String): Column
      
      
      
- y
 coordinate on y-axis
- xName
 coordinate on x-axis
- returns
 the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by
java.lang.Math.atan2
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        atan2(y: Column, x: Column): Column
      
      
      
- y
 coordinate on y-axis
- x
 coordinate on x-axis
- returns
 the theta component of the point (r, theta) in polar coordinates that corresponds to the point (x, y) in Cartesian coordinates, as if computed by
java.lang.Math.atan2
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        atanh(columnName: String): Column
      
      
      
- returns
 inverse hyperbolic tangent of
columnName
- Since
 3.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        atanh(e: Column): Column
      
      
      
- returns
 inverse hyperbolic tangent of
e
- Since
 3.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        avg(columnName: String): Column
      
      
      
Aggregate function: returns the average of the values in a group.
Aggregate function: returns the average of the values in a group.
- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        avg(e: Column): Column
      
      
      
Aggregate function: returns the average of the values in a group.
Aggregate function: returns the average of the values in a group.
- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        base64(e: Column): Column
      
      
      
Computes the BASE64 encoding of a binary column and returns it as a string column.
Computes the BASE64 encoding of a binary column and returns it as a string column. This is the reverse of unbase64.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        bin(columnName: String): Column
      
      
      
An expression that returns the string representation of the binary value of the given long column.
An expression that returns the string representation of the binary value of the given long column. For example, bin("12") returns "1100".
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        bin(e: Column): Column
      
      
      
An expression that returns the string representation of the binary value of the given long column.
An expression that returns the string representation of the binary value of the given long column. For example, bin("12") returns "1100".
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        bit_and(e: Column): Column
      
      
      
Aggregate function: returns the bitwise AND of all non-null input values, or null if none.
Aggregate function: returns the bitwise AND of all non-null input values, or null if none.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        bit_count(e: Column): Column
      
      
      
Returns the number of bits that are set in the argument expr as an unsigned 64-bit integer, or NULL if the argument is NULL.
Returns the number of bits that are set in the argument expr as an unsigned 64-bit integer, or NULL if the argument is NULL.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        bit_get(e: Column, pos: Column): Column
      
      
      
Returns the value of the bit (0 or 1) at the specified position.
Returns the value of the bit (0 or 1) at the specified position. The positions are numbered from right to left, starting at zero. The position argument cannot be negative.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        bit_length(e: Column): Column
      
      
      
Calculates the bit length for the specified string column.
Calculates the bit length for the specified string column.
- Since
 3.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        bit_or(e: Column): Column
      
      
      
Aggregate function: returns the bitwise OR of all non-null input values, or null if none.
Aggregate function: returns the bitwise OR of all non-null input values, or null if none.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        bit_xor(e: Column): Column
      
      
      
Aggregate function: returns the bitwise XOR of all non-null input values, or null if none.
Aggregate function: returns the bitwise XOR of all non-null input values, or null if none.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        bitmap_bit_position(col: Column): Column
      
      
      
Returns the bit position for the given input column.
Returns the bit position for the given input column.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        bitmap_bucket_number(col: Column): Column
      
      
      
Returns the bucket number for the given input column.
Returns the bucket number for the given input column.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        bitmap_construct_agg(col: Column): Column
      
      
      
Returns a bitmap with the positions of the bits set from all the values from the input column.
Returns a bitmap with the positions of the bits set from all the values from the input column. The input column will most likely be bitmap_bit_position().
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        bitmap_count(col: Column): Column
      
      
      
Returns the number of set bits in the input bitmap.
Returns the number of set bits in the input bitmap.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        bitmap_or_agg(col: Column): Column
      
      
      
Returns a bitmap that is the bitwise OR of all of the bitmaps from the input column.
Returns a bitmap that is the bitwise OR of all of the bitmaps from the input column. The input column should be bitmaps created from bitmap_construct_agg().
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        bitwise_not(e: Column): Column
      
      
      
Computes bitwise NOT (~) of a number.
Computes bitwise NOT (~) of a number.
- Since
 3.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        bool_and(e: Column): Column
      
      
      
Aggregate function: returns true if all values of
eare true.Aggregate function: returns true if all values of
eare true.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        bool_or(e: Column): Column
      
      
      
Aggregate function: returns true if at least one value of
eis true.Aggregate function: returns true if at least one value of
eis true.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        broadcast[T](df: Dataset[T]): Dataset[T]
      
      
      
Marks a DataFrame as small enough for use in broadcast joins.
Marks a DataFrame as small enough for use in broadcast joins.
The following example marks the right DataFrame for broadcast hash join using
joinKey.// left and right are DataFrames left.join(broadcast(right), "joinKey")
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        bround(e: Column, scale: Int): Column
      
      
      
Round the value of
etoscaledecimal places with HALF_EVEN round mode ifscaleis greater than or equal to 0 or at integral part whenscaleis less than 0.Round the value of
etoscaledecimal places with HALF_EVEN round mode ifscaleis greater than or equal to 0 or at integral part whenscaleis less than 0.- Since
 2.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        bround(e: Column): Column
      
      
      
Returns the value of the column
erounded to 0 decimal places with HALF_EVEN round mode.Returns the value of the column
erounded to 0 decimal places with HALF_EVEN round mode.- Since
 2.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        btrim(str: Column, trim: Column): Column
      
      
      
Remove the leading and trailing
trimcharacters fromstr.Remove the leading and trailing
trimcharacters fromstr.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        btrim(str: Column): Column
      
      
      
Removes the leading and trailing space characters from
str.Removes the leading and trailing space characters from
str.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        bucket(numBuckets: Int, e: Column): Column
      
      
      
A transform for any type that partitions by a hash of the input column.
A transform for any type that partitions by a hash of the input column.
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        bucket(numBuckets: Column, e: Column): Column
      
      
      
A transform for any type that partitions by a hash of the input column.
A transform for any type that partitions by a hash of the input column.
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        call_function(funcName: String, cols: Column*): Column
      
      
      
Call a SQL function.
Call a SQL function.
- funcName
 function name that follows the SQL identifier syntax (can be quoted, can be qualified)
- cols
 the expression parameters of function
- Annotations
 - @varargs()
 - Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        call_udf(udfName: String, cols: Column*): Column
      
      
      
Call an user-defined function.
Call an user-defined function. Example:
import org.apache.spark.sql._ val df = Seq(("id1", 1), ("id2", 4), ("id3", 5)).toDF("id", "value") val spark = df.sparkSession spark.udf.register("simpleUDF", (v: Int) => v * v) df.select($"id", call_udf("simpleUDF", $"value"))
- Annotations
 - @varargs()
 - Since
 3.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        cardinality(e: Column): Column
      
      
      
Returns length of array or map.
Returns length of array or map. This is an alias of
sizefunction.The function returns null for null input if spark.sql.legacy.sizeOfNull is set to false or spark.sql.ansi.enabled is set to true. Otherwise, the function returns -1 for null input. With the default settings, the function returns -1 for null input.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        cbrt(columnName: String): Column
      
      
      
Computes the cube-root of the given column.
Computes the cube-root of the given column.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        cbrt(e: Column): Column
      
      
      
Computes the cube-root of the given value.
Computes the cube-root of the given value.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        ceil(columnName: String): Column
      
      
      
Computes the ceiling of the given value of
eto 0 decimal places.Computes the ceiling of the given value of
eto 0 decimal places.- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        ceil(e: Column): Column
      
      
      
Computes the ceiling of the given value of
eto 0 decimal places.Computes the ceiling of the given value of
eto 0 decimal places.- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        ceil(e: Column, scale: Column): Column
      
      
      
Computes the ceiling of the given value of
etoscaledecimal places.Computes the ceiling of the given value of
etoscaledecimal places.- Since
 3.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        ceiling(e: Column): Column
      
      
      
Computes the ceiling of the given value of
eto 0 decimal places.Computes the ceiling of the given value of
eto 0 decimal places.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        ceiling(e: Column, scale: Column): Column
      
      
      
Computes the ceiling of the given value of
etoscaledecimal places.Computes the ceiling of the given value of
etoscaledecimal places.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        char(n: Column): Column
      
      
      
Returns the ASCII character having the binary equivalent to
n.Returns the ASCII character having the binary equivalent to
n. If n is larger than 256 the result is equivalent to char(n % 256)- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        char_length(str: Column): Column
      
      
      
Returns the character length of string data or number of bytes of binary data.
Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        character_length(str: Column): Column
      
      
      
Returns the character length of string data or number of bytes of binary data.
Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        chr(n: Column): Column
      
      
      
Returns the ASCII character having the binary equivalent to
n.Returns the ASCII character having the binary equivalent to
n. If n is larger than 256 the result is equivalent to chr(n % 256)- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        clone(): AnyRef
      
      
      
- Attributes
 - protected[lang]
 - Definition Classes
 - AnyRef
 - Annotations
 - @throws( ... ) @native()
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        coalesce(e: Column*): Column
      
      
      
Returns the first column that is not null, or null if all inputs are null.
Returns the first column that is not null, or null if all inputs are null.
For example,
coalesce(a, b, c)will return a if a is not null, or b if a is null and b is not null, or c if both a and b are null but c is not null.- Annotations
 - @varargs()
 - Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        col(colName: String): Column
      
      
      
Returns a Column based on the given column name.
Returns a Column based on the given column name.
- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        collect_list(columnName: String): Column
      
      
      
Aggregate function: returns a list of objects with duplicates.
Aggregate function: returns a list of objects with duplicates.
- Since
 1.6.0
- Note
 The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        collect_list(e: Column): Column
      
      
      
Aggregate function: returns a list of objects with duplicates.
Aggregate function: returns a list of objects with duplicates.
- Since
 1.6.0
- Note
 The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        collect_set(columnName: String): Column
      
      
      
Aggregate function: returns a set of objects with duplicate elements eliminated.
Aggregate function: returns a set of objects with duplicate elements eliminated.
- Since
 1.6.0
- Note
 The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        collect_set(e: Column): Column
      
      
      
Aggregate function: returns a set of objects with duplicate elements eliminated.
Aggregate function: returns a set of objects with duplicate elements eliminated.
- Since
 1.6.0
- Note
 The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        column(colName: String): Column
      
      
      
Returns a Column based on the given column name.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        concat(exprs: Column*): Column
      
      
      
Concatenates multiple input columns together into a single column.
Concatenates multiple input columns together into a single column. The function works with strings, binary and compatible array columns.
- Annotations
 - @varargs()
 - Since
 1.5.0
- Note
 Returns null if any of the input columns are null.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        concat_ws(sep: String, exprs: Column*): Column
      
      
      
Concatenates multiple input string columns together into a single string column, using the given separator.
Concatenates multiple input string columns together into a single string column, using the given separator.
- Annotations
 - @varargs()
 - Since
 1.5.0
- Note
 Input strings which are null are skipped.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        contains(left: Column, right: Column): Column
      
      
      
Returns a boolean.
Returns a boolean. The value is True if right is found inside left. Returns NULL if either input expression is NULL. Otherwise, returns False. Both left or right must be of STRING or BINARY type.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        conv(num: Column, fromBase: Int, toBase: Int): Column
      
      
      
Convert a number in a string column from one base to another.
Convert a number in a string column from one base to another.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        convert_timezone(targetTz: Column, sourceTs: Column): Column
      
      
      
Converts the timestamp without time zone
sourceTsfrom the current time zone totargetTz.Converts the timestamp without time zone
sourceTsfrom the current time zone totargetTz.- targetTz
 the time zone to which the input timestamp should be converted.
- sourceTs
 a timestamp without time zone.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        convert_timezone(sourceTz: Column, targetTz: Column, sourceTs: Column): Column
      
      
      
Converts the timestamp without time zone
sourceTsfrom thesourceTztime zone totargetTz.Converts the timestamp without time zone
sourceTsfrom thesourceTztime zone totargetTz.- sourceTz
 the time zone for the input timestamp. If it is missed, the current session time zone is used as the source time zone.
- targetTz
 the time zone to which the input timestamp should be converted.
- sourceTs
 a timestamp without time zone.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        corr(columnName1: String, columnName2: String): Column
      
      
      
Aggregate function: returns the Pearson Correlation Coefficient for two columns.
Aggregate function: returns the Pearson Correlation Coefficient for two columns.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        corr(column1: Column, column2: Column): Column
      
      
      
Aggregate function: returns the Pearson Correlation Coefficient for two columns.
Aggregate function: returns the Pearson Correlation Coefficient for two columns.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        cos(columnName: String): Column
      
      
      
- columnName
 angle in radians
- returns
 cosine of the angle, as if computed by
java.lang.Math.cos
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        cos(e: Column): Column
      
      
      
- e
 angle in radians
- returns
 cosine of the angle, as if computed by
java.lang.Math.cos
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        cosh(columnName: String): Column
      
      
      
- columnName
 hyperbolic angle
- returns
 hyperbolic cosine of the angle, as if computed by
java.lang.Math.cosh
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        cosh(e: Column): Column
      
      
      
- e
 hyperbolic angle
- returns
 hyperbolic cosine of the angle, as if computed by
java.lang.Math.cosh
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        cot(e: Column): Column
      
      
      
- e
 angle in radians
- returns
 cotangent of the angle
- Since
 3.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        count(columnName: String): TypedColumn[Any, Long]
      
      
      
Aggregate function: returns the number of items in a group.
Aggregate function: returns the number of items in a group.
- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        count(e: Column): Column
      
      
      
Aggregate function: returns the number of items in a group.
Aggregate function: returns the number of items in a group.
- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        countDistinct(columnName: String, columnNames: String*): Column
      
      
      
Aggregate function: returns the number of distinct items in a group.
Aggregate function: returns the number of distinct items in a group.
An alias of
count_distinct, and it is encouraged to usecount_distinctdirectly.- Annotations
 - @varargs()
 - Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        countDistinct(expr: Column, exprs: Column*): Column
      
      
      
Aggregate function: returns the number of distinct items in a group.
Aggregate function: returns the number of distinct items in a group.
An alias of
count_distinct, and it is encouraged to usecount_distinctdirectly.- Annotations
 - @varargs()
 - Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        count_distinct(expr: Column, exprs: Column*): Column
      
      
      
Aggregate function: returns the number of distinct items in a group.
Aggregate function: returns the number of distinct items in a group.
- Annotations
 - @varargs()
 - Since
 3.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        count_if(e: Column): Column
      
      
      
Aggregate function: returns the number of
TRUEvalues for the expression.Aggregate function: returns the number of
TRUEvalues for the expression.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        count_min_sketch(e: Column, eps: Column, confidence: Column, seed: Column): Column
      
      
      
Returns a count-min sketch of a column with the given esp, confidence and seed.
Returns a count-min sketch of a column with the given esp, confidence and seed. The result is an array of bytes, which can be deserialized to a
CountMinSketchbefore usage. Count-min sketch is a probabilistic data structure used for cardinality estimation using sub-linear space.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        covar_pop(columnName1: String, columnName2: String): Column
      
      
      
Aggregate function: returns the population covariance for two columns.
Aggregate function: returns the population covariance for two columns.
- Since
 2.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        covar_pop(column1: Column, column2: Column): Column
      
      
      
Aggregate function: returns the population covariance for two columns.
Aggregate function: returns the population covariance for two columns.
- Since
 2.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        covar_samp(columnName1: String, columnName2: String): Column
      
      
      
Aggregate function: returns the sample covariance for two columns.
Aggregate function: returns the sample covariance for two columns.
- Since
 2.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        covar_samp(column1: Column, column2: Column): Column
      
      
      
Aggregate function: returns the sample covariance for two columns.
Aggregate function: returns the sample covariance for two columns.
- Since
 2.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        crc32(e: Column): Column
      
      
      
Calculates the cyclic redundancy check value (CRC32) of a binary column and returns the value as a bigint.
Calculates the cyclic redundancy check value (CRC32) of a binary column and returns the value as a bigint.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        csc(e: Column): Column
      
      
      
- e
 angle in radians
- returns
 cosecant of the angle
- Since
 3.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        cume_dist(): Column
      
      
      
Window function: returns the cumulative distribution of values within a window partition, i.e.
Window function: returns the cumulative distribution of values within a window partition, i.e. the fraction of rows that are below the current row.
N = total number of rows in the partition cumeDist(x) = number of values before (and including) x / N
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        curdate(): Column
      
      
      
Returns the current date at the start of query evaluation as a date column.
Returns the current date at the start of query evaluation as a date column. All calls of current_date within the same query return the same value.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        current_catalog(): Column
      
      
      
Returns the current catalog.
Returns the current catalog.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        current_database(): Column
      
      
      
Returns the current database.
Returns the current database.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        current_date(): Column
      
      
      
Returns the current date at the start of query evaluation as a date column.
Returns the current date at the start of query evaluation as a date column. All calls of current_date within the same query return the same value.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        current_schema(): Column
      
      
      
Returns the current schema.
Returns the current schema.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        current_timestamp(): Column
      
      
      
Returns the current timestamp at the start of query evaluation as a timestamp column.
Returns the current timestamp at the start of query evaluation as a timestamp column. All calls of current_timestamp within the same query return the same value.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        current_timezone(): Column
      
      
      
Returns the current session local timezone.
Returns the current session local timezone.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        current_user(): Column
      
      
      
Returns the user name of current execution context.
Returns the user name of current execution context.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        date_add(start: Column, days: Column): Column
      
      
      
Returns the date that is
daysdays afterstartReturns the date that is
daysdays afterstart- start
 A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as
yyyy-MM-ddoryyyy-MM-dd HH:mm:ss.SSSS- days
 A column of the number of days to add to
start, can be negative to subtract days- returns
 A date, or null if
startwas a string that could not be cast to a date
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        date_add(start: Column, days: Int): Column
      
      
      
Returns the date that is
daysdays afterstartReturns the date that is
daysdays afterstart- start
 A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as
yyyy-MM-ddoryyyy-MM-dd HH:mm:ss.SSSS- days
 The number of days to add to
start, can be negative to subtract days- returns
 A date, or null if
startwas a string that could not be cast to a date
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        date_diff(end: Column, start: Column): Column
      
      
      
Returns the number of days from
starttoend.Returns the number of days from
starttoend.Only considers the date part of the input. For example:
dateddiff("2018-01-10 00:00:00", "2018-01-09 23:59:59") // returns 1
- end
 A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as
yyyy-MM-ddoryyyy-MM-dd HH:mm:ss.SSSS- start
 A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as
yyyy-MM-ddoryyyy-MM-dd HH:mm:ss.SSSS- returns
 An integer, or null if either
endorstartwere strings that could not be cast to a date. Negative ifendis beforestart
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        date_format(dateExpr: Column, format: String): Column
      
      
      
Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.
Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.
See Datetime Patterns for valid date and time format patterns
- dateExpr
 A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as
yyyy-MM-ddoryyyy-MM-dd HH:mm:ss.SSSS- format
 A pattern
dd.MM.yyyywould return a string like18.03.1993- returns
 A string, or null if
dateExprwas a string that could not be cast to a timestamp
- Since
 1.5.0
- Exceptions thrown
 IllegalArgumentExceptionif theformatpattern is invalid- Note
 Use specialized functions like year whenever possible as they benefit from a specialized implementation.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        date_from_unix_date(days: Column): Column
      
      
      
Create date from the number of
dayssince 1970-01-01.Create date from the number of
dayssince 1970-01-01.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        date_part(field: Column, source: Column): Column
      
      
      
Extracts a part of the date/timestamp or interval source.
Extracts a part of the date/timestamp or interval source.
- field
 selects which part of the source should be extracted, and supported string values are as same as the fields of the equivalent function
extract.- source
 a date/timestamp or interval column from where
fieldshould be extracted.- returns
 a part of the date/timestamp or interval source
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        date_sub(start: Column, days: Column): Column
      
      
      
Returns the date that is
daysdays beforestartReturns the date that is
daysdays beforestart- start
 A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as
yyyy-MM-ddoryyyy-MM-dd HH:mm:ss.SSSS- days
 A column of the number of days to subtract from
start, can be negative to add days- returns
 A date, or null if
startwas a string that could not be cast to a date
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        date_sub(start: Column, days: Int): Column
      
      
      
Returns the date that is
daysdays beforestartReturns the date that is
daysdays beforestart- start
 A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as
yyyy-MM-ddoryyyy-MM-dd HH:mm:ss.SSSS- days
 The number of days to subtract from
start, can be negative to add days- returns
 A date, or null if
startwas a string that could not be cast to a date
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        date_trunc(format: String, timestamp: Column): Column
      
      
      
Returns timestamp truncated to the unit specified by the format.
Returns timestamp truncated to the unit specified by the format.
For example,
date_trunc("year", "2018-11-19 12:01:19")returns 2018-01-01 00:00:00- timestamp
 A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as
yyyy-MM-ddoryyyy-MM-dd HH:mm:ss.SSSS- returns
 A timestamp, or null if
timestampwas a string that could not be cast to a timestamp orformatwas an invalid value
- Since
 2.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        dateadd(start: Column, days: Column): Column
      
      
      
Returns the date that is
daysdays afterstartReturns the date that is
daysdays afterstart- start
 A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as
yyyy-MM-ddoryyyy-MM-dd HH:mm:ss.SSSS- days
 A column of the number of days to add to
start, can be negative to subtract days- returns
 A date, or null if
startwas a string that could not be cast to a date
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        datediff(end: Column, start: Column): Column
      
      
      
Returns the number of days from
starttoend.Returns the number of days from
starttoend.Only considers the date part of the input. For example:
dateddiff("2018-01-10 00:00:00", "2018-01-09 23:59:59") // returns 1
- end
 A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as
yyyy-MM-ddoryyyy-MM-dd HH:mm:ss.SSSS- start
 A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as
yyyy-MM-ddoryyyy-MM-dd HH:mm:ss.SSSS- returns
 An integer, or null if either
endorstartwere strings that could not be cast to a date. Negative ifendis beforestart
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        datepart(field: Column, source: Column): Column
      
      
      
Extracts a part of the date/timestamp or interval source.
Extracts a part of the date/timestamp or interval source.
- field
 selects which part of the source should be extracted, and supported string values are as same as the fields of the equivalent function
EXTRACT.- source
 a date/timestamp or interval column from where
fieldshould be extracted.- returns
 a part of the date/timestamp or interval source
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        day(e: Column): Column
      
      
      
Extracts the day of the month as an integer from a given date/timestamp/string.
Extracts the day of the month as an integer from a given date/timestamp/string.
- returns
 An integer, or null if the input was a string that could not be cast to a date
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        dayofmonth(e: Column): Column
      
      
      
Extracts the day of the month as an integer from a given date/timestamp/string.
Extracts the day of the month as an integer from a given date/timestamp/string.
- returns
 An integer, or null if the input was a string that could not be cast to a date
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        dayofweek(e: Column): Column
      
      
      
Extracts the day of the week as an integer from a given date/timestamp/string.
Extracts the day of the week as an integer from a given date/timestamp/string. Ranges from 1 for a Sunday through to 7 for a Saturday
- returns
 An integer, or null if the input was a string that could not be cast to a date
- Since
 2.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        dayofyear(e: Column): Column
      
      
      
Extracts the day of the year as an integer from a given date/timestamp/string.
Extracts the day of the year as an integer from a given date/timestamp/string.
- returns
 An integer, or null if the input was a string that could not be cast to a date
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        days(e: Column): Column
      
      
      
A transform for timestamps and dates to partition data into days.
A transform for timestamps and dates to partition data into days.
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        decode(value: Column, charset: String): Column
      
      
      
Computes the first argument into a string from a binary using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').
Computes the first argument into a string from a binary using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        degrees(columnName: String): Column
      
      
      
Converts an angle measured in radians to an approximately equivalent angle measured in degrees.
Converts an angle measured in radians to an approximately equivalent angle measured in degrees.
- columnName
 angle in radians
- returns
 angle in degrees, as if computed by
java.lang.Math.toDegrees
- Since
 2.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        degrees(e: Column): Column
      
      
      
Converts an angle measured in radians to an approximately equivalent angle measured in degrees.
Converts an angle measured in radians to an approximately equivalent angle measured in degrees.
- e
 angle in radians
- returns
 angle in degrees, as if computed by
java.lang.Math.toDegrees
- Since
 2.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        dense_rank(): Column
      
      
      
Window function: returns the rank of rows within a window partition, without any gaps.
Window function: returns the rank of rows within a window partition, without any gaps.
The difference between rank and dense_rank is that denseRank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using dense_rank and had three people tie for second place, you would say that all three were in second place and that the next person came in third. Rank would give me sequential numbers, making the person that came in third place (after the ties) would register as coming in fifth.
This is equivalent to the DENSE_RANK function in SQL.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        desc(columnName: String): Column
      
      
      
Returns a sort expression based on the descending order of the column.
Returns a sort expression based on the descending order of the column.
df.sort(asc("dept"), desc("age"))
- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        desc_nulls_first(columnName: String): Column
      
      
      
Returns a sort expression based on the descending order of the column, and null values appear before non-null values.
Returns a sort expression based on the descending order of the column, and null values appear before non-null values.
df.sort(asc("dept"), desc_nulls_first("age"))
- Since
 2.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        desc_nulls_last(columnName: String): Column
      
      
      
Returns a sort expression based on the descending order of the column, and null values appear after non-null values.
Returns a sort expression based on the descending order of the column, and null values appear after non-null values.
df.sort(asc("dept"), desc_nulls_last("age"))
- Since
 2.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        e(): Column
      
      
      
Returns Euler's number.
Returns Euler's number.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        element_at(column: Column, value: Any): Column
      
      
      
Returns element of array at given index in value if column is array.
Returns element of array at given index in value if column is array. Returns value for the given key in value if column is map.
- Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        elt(inputs: Column*): Column
      
      
      
Returns the
n-th input, e.g., returnsinput2whennis 2.Returns the
n-th input, e.g., returnsinput2whennis 2. The function returns NULL if the index exceeds the length of the array andspark.sql.ansi.enabledis set to false. Ifspark.sql.ansi.enabledis set to true, it throws ArrayIndexOutOfBoundsException for invalid indices.- Annotations
 - @varargs()
 - Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        encode(value: Column, charset: String): Column
      
      
      
Computes the first argument into a binary from a string using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').
Computes the first argument into a binary from a string using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        endswith(str: Column, suffix: Column): Column
      
      
      
Returns a boolean.
Returns a boolean. The value is True if str ends with suffix. Returns NULL if either input expression is NULL. Otherwise, returns False. Both str or suffix must be of STRING or BINARY type.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        eq(arg0: AnyRef): Boolean
      
      
      
- Definition Classes
 - AnyRef
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        equal_null(col1: Column, col2: Column): Column
      
      
      
Returns same result as the EQUAL(=) operator for non-null operands, but returns true if both are null, false if one of the them is null.
Returns same result as the EQUAL(=) operator for non-null operands, but returns true if both are null, false if one of the them is null.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        equals(arg0: Any): Boolean
      
      
      
- Definition Classes
 - AnyRef → Any
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        every(e: Column): Column
      
      
      
Aggregate function: returns true if all values of
eare true.Aggregate function: returns true if all values of
eare true.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        exists(column: Column, f: (Column) ⇒ Column): Column
      
      
      
Returns whether a predicate holds for one or more elements in the array.
Returns whether a predicate holds for one or more elements in the array.
df.select(exists(col("i"), _ % 2 === 0))
- column
 the input array column
- f
 col => predicate, the Boolean predicate to check the input column
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        exp(columnName: String): Column
      
      
      
Computes the exponential of the given column.
Computes the exponential of the given column.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        exp(e: Column): Column
      
      
      
Computes the exponential of the given value.
Computes the exponential of the given value.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        explode(e: Column): Column
      
      
      
Creates a new row for each element in the given array or map column.
Creates a new row for each element in the given array or map column. Uses the default column name
colfor elements in the array andkeyandvaluefor elements in the map unless specified otherwise.- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        explode_outer(e: Column): Column
      
      
      
Creates a new row for each element in the given array or map column.
Creates a new row for each element in the given array or map column. Uses the default column name
colfor elements in the array andkeyandvaluefor elements in the map unless specified otherwise. Unlike explode, if the array/map is null or empty then null is produced.- Since
 2.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        expm1(columnName: String): Column
      
      
      
Computes the exponential of the given column minus one.
Computes the exponential of the given column minus one.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        expm1(e: Column): Column
      
      
      
Computes the exponential of the given value minus one.
Computes the exponential of the given value minus one.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        expr(expr: String): Column
      
      
      
Parses the expression string into the column that it represents, similar to Dataset#selectExpr.
Parses the expression string into the column that it represents, similar to Dataset#selectExpr.
// get the number of words of each length df.groupBy(expr("length(word)")).count()
 - 
      
      
      
        
      
    
      
        
        def
      
      
        extract(field: Column, source: Column): Column
      
      
      
Extracts a part of the date/timestamp or interval source.
Extracts a part of the date/timestamp or interval source.
- field
 selects which part of the source should be extracted.
- source
 a date/timestamp or interval column from where
fieldshould be extracted.- returns
 a part of the date/timestamp or interval source
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        factorial(e: Column): Column
      
      
      
Computes the factorial of the given value.
Computes the factorial of the given value.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        filter(column: Column, f: (Column, Column) ⇒ Column): Column
      
      
      
Returns an array of elements for which a predicate holds in a given array.
Returns an array of elements for which a predicate holds in a given array.
df.select(filter(col("s"), (x, i) => i % 2 === 0))
- column
 the input array column
- f
 (col, index) => predicate, the Boolean predicate to filter the input column given the index. Indices start at 0.
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        filter(column: Column, f: (Column) ⇒ Column): Column
      
      
      
Returns an array of elements for which a predicate holds in a given array.
Returns an array of elements for which a predicate holds in a given array.
df.select(filter(col("s"), x => x % 2 === 0))
- column
 the input array column
- f
 col => predicate, the Boolean predicate to filter the input column
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        finalize(): Unit
      
      
      
- Attributes
 - protected[lang]
 - Definition Classes
 - AnyRef
 - Annotations
 - @throws( classOf[java.lang.Throwable] )
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        find_in_set(str: Column, strArray: Column): Column
      
      
      
Returns the index (1-based) of the given string (
str) in the comma-delimited list (strArray).Returns the index (1-based) of the given string (
str) in the comma-delimited list (strArray). Returns 0, if the string was not found or if the given string (str) contains a comma.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        first(columnName: String): Column
      
      
      
Aggregate function: returns the first value of a column in a group.
Aggregate function: returns the first value of a column in a group.
The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
- Since
 1.3.0
- Note
 The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        first(e: Column): Column
      
      
      
Aggregate function: returns the first value in a group.
Aggregate function: returns the first value in a group.
The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
- Since
 1.3.0
- Note
 The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        first(columnName: String, ignoreNulls: Boolean): Column
      
      
      
Aggregate function: returns the first value of a column in a group.
Aggregate function: returns the first value of a column in a group.
The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
- Since
 2.0.0
- Note
 The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        first(e: Column, ignoreNulls: Boolean): Column
      
      
      
Aggregate function: returns the first value in a group.
Aggregate function: returns the first value in a group.
The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
- Since
 2.0.0
- Note
 The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        first_value(e: Column, ignoreNulls: Column): Column
      
      
      
Aggregate function: returns the first value in a group.
Aggregate function: returns the first value in a group.
The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
- Since
 3.5.0
- Note
 The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        first_value(e: Column): Column
      
      
      
Aggregate function: returns the first value in a group.
Aggregate function: returns the first value in a group.
- Since
 3.5.0
- Note
 The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        flatten(e: Column): Column
      
      
      
Creates a single array from an array of arrays.
Creates a single array from an array of arrays. If a structure of nested arrays is deeper than two levels, only one level of nesting is removed.
- Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        floor(columnName: String): Column
      
      
      
Computes the floor of the given column value to 0 decimal places.
Computes the floor of the given column value to 0 decimal places.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        floor(e: Column): Column
      
      
      
Computes the floor of the given value of
eto 0 decimal places.Computes the floor of the given value of
eto 0 decimal places.- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        floor(e: Column, scale: Column): Column
      
      
      
Computes the floor of the given value of
etoscaledecimal places.Computes the floor of the given value of
etoscaledecimal places.- Since
 3.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        forall(column: Column, f: (Column) ⇒ Column): Column
      
      
      
Returns whether a predicate holds for every element in the array.
Returns whether a predicate holds for every element in the array.
df.select(forall(col("i"), x => x % 2 === 0))
- column
 the input array column
- f
 col => predicate, the Boolean predicate to check the input column
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        format_number(x: Column, d: Int): Column
      
      
      
Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places with HALF_EVEN round mode, and returns the result as a string column.
Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places with HALF_EVEN round mode, and returns the result as a string column.
If d is 0, the result has no decimal point or fractional part. If d is less than 0, the result will be null.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        format_string(format: String, arguments: Column*): Column
      
      
      
Formats the arguments in printf-style and returns the result as a string column.
Formats the arguments in printf-style and returns the result as a string column.
- Annotations
 - @varargs()
 - Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        from_csv(e: Column, schema: Column, options: Map[String, String]): Column
      
      
      
(Java-specific) Parses a column containing a CSV string into a
StructTypewith the specified schema.(Java-specific) Parses a column containing a CSV string into a
StructTypewith the specified schema. Returnsnull, in the case of an unparseable string.- e
 a string column containing CSV data.
- schema
 the schema to use when parsing the CSV string
- options
 options to control how the CSV is parsed. accepts the same options and the CSV data source. See Data Source Option in the version you use.
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        from_csv(e: Column, schema: StructType, options: Map[String, String]): Column
      
      
      
Parses a column containing a CSV string into a
StructTypewith the specified schema.Parses a column containing a CSV string into a
StructTypewith the specified schema. Returnsnull, in the case of an unparseable string.- e
 a string column containing CSV data.
- schema
 the schema to use when parsing the CSV string
- options
 options to control how the CSV is parsed. accepts the same options and the CSV data source. See Data Source Option in the version you use.
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        from_json(e: Column, schema: Column, options: Map[String, String]): Column
      
      
      
(Java-specific) Parses a column containing a JSON string into a
MapTypewithStringTypeas keys type,StructTypeorArrayTypeofStructTypes with the specified schema.(Java-specific) Parses a column containing a JSON string into a
MapTypewithStringTypeas keys type,StructTypeorArrayTypeofStructTypes with the specified schema. Returnsnull, in the case of an unparseable string.- e
 a string column containing JSON data.
- schema
 the schema to use when parsing the json string
- options
 options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.
- Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        from_json(e: Column, schema: Column): Column
      
      
      
(Scala-specific) Parses a column containing a JSON string into a
MapTypewithStringTypeas keys type,StructTypeorArrayTypeofStructTypes with the specified schema.(Scala-specific) Parses a column containing a JSON string into a
MapTypewithStringTypeas keys type,StructTypeorArrayTypeofStructTypes with the specified schema. Returnsnull, in the case of an unparseable string.- e
 a string column containing JSON data.
- schema
 the schema to use when parsing the json string
- Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        from_json(e: Column, schema: String, options: Map[String, String]): Column
      
      
      
(Scala-specific) Parses a column containing a JSON string into a
MapTypewithStringTypeas keys type,StructTypeorArrayTypewith the specified schema.(Scala-specific) Parses a column containing a JSON string into a
MapTypewithStringTypeas keys type,StructTypeorArrayTypewith the specified schema. Returnsnull, in the case of an unparseable string.- e
 a string column containing JSON data.
- schema
 the schema as a DDL-formatted string.
- options
 options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.
- Since
 2.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        from_json(e: Column, schema: String, options: Map[String, String]): Column
      
      
      
(Java-specific) Parses a column containing a JSON string into a
MapTypewithStringTypeas keys type,StructTypeorArrayTypewith the specified schema.(Java-specific) Parses a column containing a JSON string into a
MapTypewithStringTypeas keys type,StructTypeorArrayTypewith the specified schema. Returnsnull, in the case of an unparseable string.- e
 a string column containing JSON data.
- schema
 the schema as a DDL-formatted string.
- options
 options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.
- Since
 2.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        from_json(e: Column, schema: DataType): Column
      
      
      
Parses a column containing a JSON string into a
MapTypewithStringTypeas keys type,StructTypeorArrayTypewith the specified schema.Parses a column containing a JSON string into a
MapTypewithStringTypeas keys type,StructTypeorArrayTypewith the specified schema. Returnsnull, in the case of an unparseable string.- e
 a string column containing JSON data.
- schema
 the schema to use when parsing the json string
- Since
 2.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        from_json(e: Column, schema: StructType): Column
      
      
      
Parses a column containing a JSON string into a
StructTypewith the specified schema.Parses a column containing a JSON string into a
StructTypewith the specified schema. Returnsnull, in the case of an unparseable string.- e
 a string column containing JSON data.
- schema
 the schema to use when parsing the json string
- Since
 2.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        from_json(e: Column, schema: DataType, options: Map[String, String]): Column
      
      
      
(Java-specific) Parses a column containing a JSON string into a
MapTypewithStringTypeas keys type,StructTypeorArrayTypewith the specified schema.(Java-specific) Parses a column containing a JSON string into a
MapTypewithStringTypeas keys type,StructTypeorArrayTypewith the specified schema. Returnsnull, in the case of an unparseable string.- e
 a string column containing JSON data.
- schema
 the schema to use when parsing the json string
- options
 options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.
- Since
 2.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        from_json(e: Column, schema: StructType, options: Map[String, String]): Column
      
      
      
(Java-specific) Parses a column containing a JSON string into a
StructTypewith the specified schema.(Java-specific) Parses a column containing a JSON string into a
StructTypewith the specified schema. Returnsnull, in the case of an unparseable string.- e
 a string column containing JSON data.
- schema
 the schema to use when parsing the json string
- options
 options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.
- Since
 2.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        from_json(e: Column, schema: DataType, options: Map[String, String]): Column
      
      
      
(Scala-specific) Parses a column containing a JSON string into a
MapTypewithStringTypeas keys type,StructTypeorArrayTypewith the specified schema.(Scala-specific) Parses a column containing a JSON string into a
MapTypewithStringTypeas keys type,StructTypeorArrayTypewith the specified schema. Returnsnull, in the case of an unparseable string.- e
 a string column containing JSON data.
- schema
 the schema to use when parsing the json string
- options
 options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.
- Since
 2.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        from_json(e: Column, schema: StructType, options: Map[String, String]): Column
      
      
      
(Scala-specific) Parses a column containing a JSON string into a
StructTypewith the specified schema.(Scala-specific) Parses a column containing a JSON string into a
StructTypewith the specified schema. Returnsnull, in the case of an unparseable string.- e
 a string column containing JSON data.
- schema
 the schema to use when parsing the json string
- options
 options to control how the json is parsed. Accepts the same options as the json data source. See Data Source Option in the version you use.
- Since
 2.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        from_unixtime(ut: Column, f: String): Column
      
      
      
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.
See Datetime Patterns for valid date and time format patterns
- ut
 A number of a type that is castable to a long, such as string or integer. Can be negative for timestamps before the unix epoch
- f
 A date time pattern that the input will be formatted to
- returns
 A string, or null if
utwas a string that could not be cast to a long orfwas an invalid date time pattern
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        from_unixtime(ut: Column): Column
      
      
      
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the yyyy-MM-dd HH:mm:ss format.
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the yyyy-MM-dd HH:mm:ss format.
- ut
 A number of a type that is castable to a long, such as string or integer. Can be negative for timestamps before the unix epoch
- returns
 A string, or null if the input was a string that could not be cast to a long
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        from_utc_timestamp(ts: Column, tz: Column): Column
      
      
      
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone.
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'.
- Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        from_utc_timestamp(ts: Column, tz: String): Column
      
      
      
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone.
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'.
- ts
 A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as
yyyy-MM-ddoryyyy-MM-dd HH:mm:ss.SSSS- tz
 A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous.
- returns
 A timestamp, or null if
tswas a string that could not be cast to a timestamp ortzwas an invalid value
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        get(column: Column, index: Column): Column
      
      
      
Returns element of array at given (0-based) index.
Returns element of array at given (0-based) index. If the index points outside of the array boundaries, then this function returns NULL.
- Since
 3.4.0
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        getClass(): Class[_]
      
      
      
- Definition Classes
 - AnyRef → Any
 - Annotations
 - @native()
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        get_json_object(e: Column, path: String): Column
      
      
      
Extracts json object from a json string based on json path specified, and returns json string of the extracted json object.
Extracts json object from a json string based on json path specified, and returns json string of the extracted json object. It will return null if the input json string is invalid.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        getbit(e: Column, pos: Column): Column
      
      
      
Returns the value of the bit (0 or 1) at the specified position.
Returns the value of the bit (0 or 1) at the specified position. The positions are numbered from right to left, starting at zero. The position argument cannot be negative.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        greatest(columnName: String, columnNames: String*): Column
      
      
      
Returns the greatest value of the list of column names, skipping null values.
Returns the greatest value of the list of column names, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.
- Annotations
 - @varargs()
 - Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        greatest(exprs: Column*): Column
      
      
      
Returns the greatest value of the list of values, skipping null values.
Returns the greatest value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.
- Annotations
 - @varargs()
 - Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        grouping(columnName: String): Column
      
      
      
Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.
Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.
- Since
 2.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        grouping(e: Column): Column
      
      
      
Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.
Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.
- Since
 2.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        grouping_id(colName: String, colNames: String*): Column
      
      
      
Aggregate function: returns the level of grouping, equals to
Aggregate function: returns the level of grouping, equals to
(grouping(c1) <<; (n-1)) + (grouping(c2) <<; (n-2)) + ... + grouping(cn)
- Since
 2.0.0
- Note
 The list of columns should match with grouping columns exactly.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        grouping_id(cols: Column*): Column
      
      
      
Aggregate function: returns the level of grouping, equals to
Aggregate function: returns the level of grouping, equals to
(grouping(c1) <<; (n-1)) + (grouping(c2) <<; (n-2)) + ... + grouping(cn)
- Since
 2.0.0
- Note
 The list of columns should match with grouping columns exactly, or empty (means all the grouping columns).
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hash(cols: Column*): Column
      
      
      
Calculates the hash code of given columns, and returns the result as an int column.
Calculates the hash code of given columns, and returns the result as an int column.
- Annotations
 - @varargs()
 - Since
 2.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hashCode(): Int
      
      
      
- Definition Classes
 - AnyRef → Any
 - Annotations
 - @native()
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hex(column: Column): Column
      
      
      
Computes hex value of the given column.
Computes hex value of the given column.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        histogram_numeric(e: Column, nBins: Column): Column
      
      
      
Aggregate function: computes a histogram on numeric 'expr' using nb bins.
Aggregate function: computes a histogram on numeric 'expr' using nb bins. The return value is an array of (x,y) pairs representing the centers of the histogram's bins. As the value of 'nb' is increased, the histogram approximation gets finer-grained, but may yield artifacts around outliers. In practice, 20-40 histogram bins appear to work well, with more bins being required for skewed or smaller datasets. Note that this function creates a histogram with non-uniform bin widths. It offers no guarantees in terms of the mean-squared-error of the histogram, but in practice is comparable to the histograms produced by the R/S-Plus statistical computing packages. Note: the output type of the 'x' field in the return value is propagated from the input value consumed in the aggregate function.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hll_sketch_agg(columnName: String): Column
      
      
      
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with default lgConfigK value.
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with default lgConfigK value.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hll_sketch_agg(e: Column): Column
      
      
      
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with default lgConfigK value.
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with default lgConfigK value.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hll_sketch_agg(columnName: String, lgConfigK: Int): Column
      
      
      
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with lgConfigK arg.
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with lgConfigK arg.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hll_sketch_agg(e: Column, lgConfigK: Int): Column
      
      
      
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with lgConfigK arg.
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with lgConfigK arg.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hll_sketch_agg(e: Column, lgConfigK: Column): Column
      
      
      
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with lgConfigK arg.
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with lgConfigK arg.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hll_sketch_estimate(columnName: String): Column
      
      
      
Returns the estimated number of unique values given the binary representation of a Datasketches HllSketch.
Returns the estimated number of unique values given the binary representation of a Datasketches HllSketch.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hll_sketch_estimate(c: Column): Column
      
      
      
Returns the estimated number of unique values given the binary representation of a Datasketches HllSketch.
Returns the estimated number of unique values given the binary representation of a Datasketches HllSketch.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hll_union(columnName1: String, columnName2: String, allowDifferentLgConfigK: Boolean): Column
      
      
      
Merges two binary representations of Datasketches HllSketch objects, using a Datasketches Union object.
Merges two binary representations of Datasketches HllSketch objects, using a Datasketches Union object. Throws an exception if sketches have different lgConfigK values and allowDifferentLgConfigK is set to false.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hll_union(c1: Column, c2: Column, allowDifferentLgConfigK: Boolean): Column
      
      
      
Merges two binary representations of Datasketches HllSketch objects, using a Datasketches Union object.
Merges two binary representations of Datasketches HllSketch objects, using a Datasketches Union object. Throws an exception if sketches have different lgConfigK values and allowDifferentLgConfigK is set to false.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hll_union(columnName1: String, columnName2: String): Column
      
      
      
Merges two binary representations of Datasketches HllSketch objects, using a Datasketches Union object.
Merges two binary representations of Datasketches HllSketch objects, using a Datasketches Union object. Throws an exception if sketches have different lgConfigK values.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hll_union(c1: Column, c2: Column): Column
      
      
      
Merges two binary representations of Datasketches HllSketch objects, using a Datasketches Union object.
Merges two binary representations of Datasketches HllSketch objects, using a Datasketches Union object. Throws an exception if sketches have different lgConfigK values.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hll_union_agg(columnName: String): Column
      
      
      
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance.
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance. Throws an exception if sketches have different lgConfigK values.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hll_union_agg(e: Column): Column
      
      
      
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance.
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance. Throws an exception if sketches have different lgConfigK values.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hll_union_agg(columnName: String, allowDifferentLgConfigK: Boolean): Column
      
      
      
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance.
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance. Throws an exception if sketches have different lgConfigK values and allowDifferentLgConfigK is set to false.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hll_union_agg(e: Column, allowDifferentLgConfigK: Boolean): Column
      
      
      
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance.
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance. Throws an exception if sketches have different lgConfigK values and allowDifferentLgConfigK is set to false.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hll_union_agg(e: Column, allowDifferentLgConfigK: Column): Column
      
      
      
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance.
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance. Throws an exception if sketches have different lgConfigK values and allowDifferentLgConfigK is set to false.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hour(e: Column): Column
      
      
      
Extracts the hours as an integer from a given date/timestamp/string.
Extracts the hours as an integer from a given date/timestamp/string.
- returns
 An integer, or null if the input was a string that could not be cast to a date
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hours(e: Column): Column
      
      
      
A transform for timestamps to partition data into hours.
A transform for timestamps to partition data into hours.
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hypot(l: Double, rightName: String): Column
      
      
      
Computes
sqrt(a2 + b2)without intermediate overflow or underflow.Computes
sqrt(a2 + b2)without intermediate overflow or underflow.- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hypot(l: Double, r: Column): Column
      
      
      
Computes
sqrt(a2 + b2)without intermediate overflow or underflow.Computes
sqrt(a2 + b2)without intermediate overflow or underflow.- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hypot(leftName: String, r: Double): Column
      
      
      
Computes
sqrt(a2 + b2)without intermediate overflow or underflow.Computes
sqrt(a2 + b2)without intermediate overflow or underflow.- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hypot(l: Column, r: Double): Column
      
      
      
Computes
sqrt(a2 + b2)without intermediate overflow or underflow.Computes
sqrt(a2 + b2)without intermediate overflow or underflow.- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hypot(leftName: String, rightName: String): Column
      
      
      
Computes
sqrt(a2 + b2)without intermediate overflow or underflow.Computes
sqrt(a2 + b2)without intermediate overflow or underflow.- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hypot(leftName: String, r: Column): Column
      
      
      
Computes
sqrt(a2 + b2)without intermediate overflow or underflow.Computes
sqrt(a2 + b2)without intermediate overflow or underflow.- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hypot(l: Column, rightName: String): Column
      
      
      
Computes
sqrt(a2 + b2)without intermediate overflow or underflow.Computes
sqrt(a2 + b2)without intermediate overflow or underflow.- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hypot(l: Column, r: Column): Column
      
      
      
Computes
sqrt(a2 + b2)without intermediate overflow or underflow.Computes
sqrt(a2 + b2)without intermediate overflow or underflow.- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        ifnull(col1: Column, col2: Column): Column
      
      
      
Returns
col2ifcol1is null, orcol1otherwise.Returns
col2ifcol1is null, orcol1otherwise.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        ilike(str: Column, pattern: Column): Column
      
      
      
Returns true if str matches
patternwithescapeChar('\') case-insensitively, null if any arguments are null, false otherwise.Returns true if str matches
patternwithescapeChar('\') case-insensitively, null if any arguments are null, false otherwise.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        ilike(str: Column, pattern: Column, escapeChar: Column): Column
      
      
      
Returns true if str matches
patternwithescapeCharcase-insensitively, null if any arguments are null, false otherwise.Returns true if str matches
patternwithescapeCharcase-insensitively, null if any arguments are null, false otherwise.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        initcap(e: Column): Column
      
      
      
Returns a new string column by converting the first letter of each word to uppercase.
Returns a new string column by converting the first letter of each word to uppercase. Words are delimited by whitespace.
For example, "hello world" will become "Hello World".
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        inline(e: Column): Column
      
      
      
Creates a new row for each element in the given array of structs.
Creates a new row for each element in the given array of structs.
- Since
 3.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        inline_outer(e: Column): Column
      
      
      
Creates a new row for each element in the given array of structs.
Creates a new row for each element in the given array of structs. Unlike inline, if the array is null or empty then null is produced for each nested column.
- Since
 3.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        input_file_block_length(): Column
      
      
      
Returns the length of the block being read, or -1 if not available.
Returns the length of the block being read, or -1 if not available.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        input_file_block_start(): Column
      
      
      
Returns the start offset of the block being read, or -1 if not available.
Returns the start offset of the block being read, or -1 if not available.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        input_file_name(): Column
      
      
      
Creates a string column for the file name of the current Spark task.
Creates a string column for the file name of the current Spark task.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        instr(str: Column, substring: String): Column
      
      
      
Locate the position of the first occurrence of substr column in the given string.
Locate the position of the first occurrence of substr column in the given string. Returns null if either of the arguments are null.
- Since
 1.5.0
- Note
 The position is not zero based, but 1 based index. Returns 0 if substr could not be found in str.
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        isInstanceOf[T0]: Boolean
      
      
      
- Definition Classes
 - Any
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        isnan(e: Column): Column
      
      
      
Return true iff the column is NaN.
Return true iff the column is NaN.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        isnotnull(col: Column): Column
      
      
      
Returns true if
colis not null, or false otherwise.Returns true if
colis not null, or false otherwise.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        isnull(e: Column): Column
      
      
      
Return true iff the column is null.
Return true iff the column is null.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        java_method(cols: Column*): Column
      
      
      
Calls a method with reflection.
Calls a method with reflection.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        json_array_length(jsonArray: Column): Column
      
      
      
Returns the number of elements in the outermost JSON array.
Returns the number of elements in the outermost JSON array.
NULLis returned in case of any other valid JSON string,NULLor an invalid JSON.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        json_object_keys(json: Column): Column
      
      
      
Returns all the keys of the outermost JSON object as an array.
Returns all the keys of the outermost JSON object as an array. If a valid JSON object is given, all the keys of the outermost object will be returned as an array. If it is any other valid JSON string, an invalid JSON string or an empty string, the function returns null.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        json_tuple(json: Column, fields: String*): Column
      
      
      
Creates a new row for a json column according to the given field names.
Creates a new row for a json column according to the given field names.
- Annotations
 - @varargs()
 - Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        kurtosis(columnName: String): Column
      
      
      
Aggregate function: returns the kurtosis of the values in a group.
Aggregate function: returns the kurtosis of the values in a group.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        kurtosis(e: Column): Column
      
      
      
Aggregate function: returns the kurtosis of the values in a group.
Aggregate function: returns the kurtosis of the values in a group.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        lag(e: Column, offset: Int, defaultValue: Any, ignoreNulls: Boolean): Column
      
      
      
Window function: returns the value that is
offsetrows before the current row, anddefaultValueif there is less thanoffsetrows before the current row.Window function: returns the value that is
offsetrows before the current row, anddefaultValueif there is less thanoffsetrows before the current row.ignoreNullsdetermines whether null values of row are included in or eliminated from the calculation. For example, anoffsetof one will return the previous row at any given point in the window partition.This is equivalent to the LAG function in SQL.
- Since
 3.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        lag(e: Column, offset: Int, defaultValue: Any): Column
      
      
      
Window function: returns the value that is
offsetrows before the current row, anddefaultValueif there is less thanoffsetrows before the current row.Window function: returns the value that is
offsetrows before the current row, anddefaultValueif there is less thanoffsetrows before the current row. For example, anoffsetof one will return the previous row at any given point in the window partition.This is equivalent to the LAG function in SQL.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        lag(columnName: String, offset: Int, defaultValue: Any): Column
      
      
      
Window function: returns the value that is
offsetrows before the current row, anddefaultValueif there is less thanoffsetrows before the current row.Window function: returns the value that is
offsetrows before the current row, anddefaultValueif there is less thanoffsetrows before the current row. For example, anoffsetof one will return the previous row at any given point in the window partition.This is equivalent to the LAG function in SQL.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        lag(columnName: String, offset: Int): Column
      
      
      
Window function: returns the value that is
offsetrows before the current row, andnullif there is less thanoffsetrows before the current row.Window function: returns the value that is
offsetrows before the current row, andnullif there is less thanoffsetrows before the current row. For example, anoffsetof one will return the previous row at any given point in the window partition.This is equivalent to the LAG function in SQL.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        lag(e: Column, offset: Int): Column
      
      
      
Window function: returns the value that is
offsetrows before the current row, andnullif there is less thanoffsetrows before the current row.Window function: returns the value that is
offsetrows before the current row, andnullif there is less thanoffsetrows before the current row. For example, anoffsetof one will return the previous row at any given point in the window partition.This is equivalent to the LAG function in SQL.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        last(columnName: String): Column
      
      
      
Aggregate function: returns the last value of the column in a group.
Aggregate function: returns the last value of the column in a group.
The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
- Since
 1.3.0
- Note
 The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        last(e: Column): Column
      
      
      
Aggregate function: returns the last value in a group.
Aggregate function: returns the last value in a group.
The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
- Since
 1.3.0
- Note
 The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        last(columnName: String, ignoreNulls: Boolean): Column
      
      
      
Aggregate function: returns the last value of the column in a group.
Aggregate function: returns the last value of the column in a group.
The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
- Since
 2.0.0
- Note
 The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        last(e: Column, ignoreNulls: Boolean): Column
      
      
      
Aggregate function: returns the last value in a group.
Aggregate function: returns the last value in a group.
The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
- Since
 2.0.0
- Note
 The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        last_day(e: Column): Column
      
      
      
Returns the last day of the month which the given date belongs to.
Returns the last day of the month which the given date belongs to. For example, input "2015-07-27" returns "2015-07-31" since July 31 is the last day of the month in July 2015.
- e
 A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as
yyyy-MM-ddoryyyy-MM-dd HH:mm:ss.SSSS- returns
 A date, or null if the input was a string that could not be cast to a date
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        last_value(e: Column, ignoreNulls: Column): Column
      
      
      
Aggregate function: returns the last value in a group.
Aggregate function: returns the last value in a group.
The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
- Since
 3.5.0
- Note
 The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        last_value(e: Column): Column
      
      
      
Aggregate function: returns the last value in a group.
Aggregate function: returns the last value in a group.
- Since
 3.5.0
- Note
 The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        lcase(str: Column): Column
      
      
      
Returns
strwith all characters changed to lowercase.Returns
strwith all characters changed to lowercase.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        lead(e: Column, offset: Int, defaultValue: Any, ignoreNulls: Boolean): Column
      
      
      
Window function: returns the value that is
offsetrows after the current row, anddefaultValueif there is less thanoffsetrows after the current row.Window function: returns the value that is
offsetrows after the current row, anddefaultValueif there is less thanoffsetrows after the current row.ignoreNullsdetermines whether null values of row are included in or eliminated from the calculation. The default value ofignoreNullsis false. For example, anoffsetof one will return the next row at any given point in the window partition.This is equivalent to the LEAD function in SQL.
- Since
 3.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        lead(e: Column, offset: Int, defaultValue: Any): Column
      
      
      
Window function: returns the value that is
offsetrows after the current row, anddefaultValueif there is less thanoffsetrows after the current row.Window function: returns the value that is
offsetrows after the current row, anddefaultValueif there is less thanoffsetrows after the current row. For example, anoffsetof one will return the next row at any given point in the window partition.This is equivalent to the LEAD function in SQL.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        lead(columnName: String, offset: Int, defaultValue: Any): Column
      
      
      
Window function: returns the value that is
offsetrows after the current row, anddefaultValueif there is less thanoffsetrows after the current row.Window function: returns the value that is
offsetrows after the current row, anddefaultValueif there is less thanoffsetrows after the current row. For example, anoffsetof one will return the next row at any given point in the window partition.This is equivalent to the LEAD function in SQL.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        lead(e: Column, offset: Int): Column
      
      
      
Window function: returns the value that is
offsetrows after the current row, andnullif there is less thanoffsetrows after the current row.Window function: returns the value that is
offsetrows after the current row, andnullif there is less thanoffsetrows after the current row. For example, anoffsetof one will return the next row at any given point in the window partition.This is equivalent to the LEAD function in SQL.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        lead(columnName: String, offset: Int): Column
      
      
      
Window function: returns the value that is
offsetrows after the current row, andnullif there is less thanoffsetrows after the current row.Window function: returns the value that is
offsetrows after the current row, andnullif there is less thanoffsetrows after the current row. For example, anoffsetof one will return the next row at any given point in the window partition.This is equivalent to the LEAD function in SQL.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        least(columnName: String, columnNames: String*): Column
      
      
      
Returns the least value of the list of column names, skipping null values.
Returns the least value of the list of column names, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.
- Annotations
 - @varargs()
 - Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        least(exprs: Column*): Column
      
      
      
Returns the least value of the list of values, skipping null values.
Returns the least value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.
- Annotations
 - @varargs()
 - Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        left(str: Column, len: Column): Column
      
      
      
Returns the leftmost
len(lencan be string type) characters from the stringstr, iflenis less or equal than 0 the result is an empty string.Returns the leftmost
len(lencan be string type) characters from the stringstr, iflenis less or equal than 0 the result is an empty string.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        len(e: Column): Column
      
      
      
Computes the character length of a given string or number of bytes of a binary string.
Computes the character length of a given string or number of bytes of a binary string. The length of character strings include the trailing spaces. The length of binary strings includes binary zeros.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        length(e: Column): Column
      
      
      
Computes the character length of a given string or number of bytes of a binary string.
Computes the character length of a given string or number of bytes of a binary string. The length of character strings include the trailing spaces. The length of binary strings includes binary zeros.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        levenshtein(l: Column, r: Column): Column
      
      
      
Computes the Levenshtein distance of the two given string columns.
Computes the Levenshtein distance of the two given string columns.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        levenshtein(l: Column, r: Column, threshold: Int): Column
      
      
      
Computes the Levenshtein distance of the two given string columns if it's less than or equal to a given threshold.
Computes the Levenshtein distance of the two given string columns if it's less than or equal to a given threshold.
- returns
 result distance, or -1
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        like(str: Column, pattern: Column): Column
      
      
      
Returns true if str matches
patternwithescapeChar('\'), null if any arguments are null, false otherwise.Returns true if str matches
patternwithescapeChar('\'), null if any arguments are null, false otherwise.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        like(str: Column, pattern: Column, escapeChar: Column): Column
      
      
      
Returns true if str matches
patternwithescapeChar, null if any arguments are null, false otherwise.Returns true if str matches
patternwithescapeChar, null if any arguments are null, false otherwise.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        lit(literal: Any): Column
      
      
      
Creates a Column of literal value.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        ln(e: Column): Column
      
      
      
Computes the natural logarithm of the given value.
Computes the natural logarithm of the given value.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        localtimestamp(): Column
      
      
      
Returns the current timestamp without time zone at the start of query evaluation as a timestamp without time zone column.
Returns the current timestamp without time zone at the start of query evaluation as a timestamp without time zone column. All calls of localtimestamp within the same query return the same value.
- Since
 3.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        locate(substr: String, str: Column, pos: Int): Column
      
      
      
Locate the position of the first occurrence of substr in a string column, after position pos.
Locate the position of the first occurrence of substr in a string column, after position pos.
- Since
 1.5.0
- Note
 The position is not zero based, but 1 based index. returns 0 if substr could not be found in str.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        locate(substr: String, str: Column): Column
      
      
      
Locate the position of the first occurrence of substr.
Locate the position of the first occurrence of substr.
- Since
 1.5.0
- Note
 The position is not zero based, but 1 based index. Returns 0 if substr could not be found in str.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        log(base: Double, columnName: String): Column
      
      
      
Returns the first argument-base logarithm of the second argument.
Returns the first argument-base logarithm of the second argument.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        log(base: Double, a: Column): Column
      
      
      
Returns the first argument-base logarithm of the second argument.
Returns the first argument-base logarithm of the second argument.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        log(columnName: String): Column
      
      
      
Computes the natural logarithm of the given column.
Computes the natural logarithm of the given column.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        log(e: Column): Column
      
      
      
Computes the natural logarithm of the given value.
Computes the natural logarithm of the given value.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        log10(columnName: String): Column
      
      
      
Computes the logarithm of the given value in base 10.
Computes the logarithm of the given value in base 10.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        log10(e: Column): Column
      
      
      
Computes the logarithm of the given value in base 10.
Computes the logarithm of the given value in base 10.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        log1p(columnName: String): Column
      
      
      
Computes the natural logarithm of the given column plus one.
Computes the natural logarithm of the given column plus one.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        log1p(e: Column): Column
      
      
      
Computes the natural logarithm of the given value plus one.
Computes the natural logarithm of the given value plus one.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        log2(columnName: String): Column
      
      
      
Computes the logarithm of the given value in base 2.
Computes the logarithm of the given value in base 2.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        log2(expr: Column): Column
      
      
      
Computes the logarithm of the given column in base 2.
Computes the logarithm of the given column in base 2.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        lower(e: Column): Column
      
      
      
Converts a string column to lower case.
Converts a string column to lower case.
- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        lpad(str: Column, len: Int, pad: Array[Byte]): Column
      
      
      
Left-pad the binary column with pad to a byte length of len.
Left-pad the binary column with pad to a byte length of len. If the binary column is longer than len, the return value is shortened to len bytes.
- Since
 3.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        lpad(str: Column, len: Int, pad: String): Column
      
      
      
Left-pad the string column with pad to a length of len.
Left-pad the string column with pad to a length of len. If the string column is longer than len, the return value is shortened to len characters.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        ltrim(e: Column, trimString: String): Column
      
      
      
Trim the specified character string from left end for the specified string column.
Trim the specified character string from left end for the specified string column.
- Since
 2.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        ltrim(e: Column): Column
      
      
      
Trim the spaces from left end for the specified string value.
Trim the spaces from left end for the specified string value.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        make_date(year: Column, month: Column, day: Column): Column
      
      
      
- returns
 A date created from year, month and day fields.
- Since
 3.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        make_dt_interval(): Column
      
      
      
Make DayTimeIntervalType duration.
Make DayTimeIntervalType duration.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        make_dt_interval(days: Column): Column
      
      
      
Make DayTimeIntervalType duration from days.
Make DayTimeIntervalType duration from days.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        make_dt_interval(days: Column, hours: Column): Column
      
      
      
Make DayTimeIntervalType duration from days and hours.
Make DayTimeIntervalType duration from days and hours.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        make_dt_interval(days: Column, hours: Column, mins: Column): Column
      
      
      
Make DayTimeIntervalType duration from days, hours and mins.
Make DayTimeIntervalType duration from days, hours and mins.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        make_dt_interval(days: Column, hours: Column, mins: Column, secs: Column): Column
      
      
      
Make DayTimeIntervalType duration from days, hours, mins and secs.
Make DayTimeIntervalType duration from days, hours, mins and secs.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        make_interval(): Column
      
      
      
Make interval.
Make interval.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        make_interval(years: Column): Column
      
      
      
Make interval from years.
Make interval from years.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        make_interval(years: Column, months: Column): Column
      
      
      
Make interval from years and months.
Make interval from years and months.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        make_interval(years: Column, months: Column, weeks: Column): Column
      
      
      
Make interval from years, months and weeks.
Make interval from years, months and weeks.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        make_interval(years: Column, months: Column, weeks: Column, days: Column): Column
      
      
      
Make interval from years, months, weeks and days.
Make interval from years, months, weeks and days.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        make_interval(years: Column, months: Column, weeks: Column, days: Column, hours: Column): Column
      
      
      
Make interval from years, months, weeks, days and hours.
Make interval from years, months, weeks, days and hours.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        make_interval(years: Column, months: Column, weeks: Column, days: Column, hours: Column, mins: Column): Column
      
      
      
Make interval from years, months, weeks, days, hours and mins.
Make interval from years, months, weeks, days, hours and mins.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        make_interval(years: Column, months: Column, weeks: Column, days: Column, hours: Column, mins: Column, secs: Column): Column
      
      
      
Make interval from years, months, weeks, days, hours, mins and secs.
Make interval from years, months, weeks, days, hours, mins and secs.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        make_timestamp(years: Column, months: Column, days: Column, hours: Column, mins: Column, secs: Column): Column
      
      
      
Create timestamp from years, months, days, hours, mins and secs fields.
Create timestamp from years, months, days, hours, mins and secs fields. The result data type is consistent with the value of configuration
spark.sql.timestampType. If the configurationspark.sql.ansi.enabledis false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        make_timestamp(years: Column, months: Column, days: Column, hours: Column, mins: Column, secs: Column, timezone: Column): Column
      
      
      
Create timestamp from years, months, days, hours, mins, secs and timezone fields.
Create timestamp from years, months, days, hours, mins, secs and timezone fields. The result data type is consistent with the value of configuration
spark.sql.timestampType. If the configurationspark.sql.ansi.enabledis false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        make_timestamp_ltz(years: Column, months: Column, days: Column, hours: Column, mins: Column, secs: Column): Column
      
      
      
Create the current timestamp with local time zone from years, months, days, hours, mins and secs fields.
Create the current timestamp with local time zone from years, months, days, hours, mins and secs fields. If the configuration
spark.sql.ansi.enabledis false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        make_timestamp_ltz(years: Column, months: Column, days: Column, hours: Column, mins: Column, secs: Column, timezone: Column): Column
      
      
      
Create the current timestamp with local time zone from years, months, days, hours, mins, secs and timezone fields.
Create the current timestamp with local time zone from years, months, days, hours, mins, secs and timezone fields. If the configuration
spark.sql.ansi.enabledis false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        make_timestamp_ntz(years: Column, months: Column, days: Column, hours: Column, mins: Column, secs: Column): Column
      
      
      
Create local date-time from years, months, days, hours, mins, secs fields.
Create local date-time from years, months, days, hours, mins, secs fields. If the configuration
spark.sql.ansi.enabledis false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        make_ym_interval(): Column
      
      
      
Make year-month interval.
Make year-month interval.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        make_ym_interval(years: Column): Column
      
      
      
Make year-month interval from years.
Make year-month interval from years.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        make_ym_interval(years: Column, months: Column): Column
      
      
      
Make year-month interval from years, months.
Make year-month interval from years, months.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        map(cols: Column*): Column
      
      
      
Creates a new map column.
Creates a new map column. The input columns must be grouped as key-value pairs, e.g. (key1, value1, key2, value2, ...). The key columns must all have the same data type, and can't be null. The value columns must all have the same data type.
- Annotations
 - @varargs()
 - Since
 2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        map_concat(cols: Column*): Column
      
      
      
Returns the union of all the given maps.
Returns the union of all the given maps.
- Annotations
 - @varargs()
 - Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        map_contains_key(column: Column, key: Any): Column
      
      
      
Returns true if the map contains the key.
Returns true if the map contains the key.
- Since
 3.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        map_entries(e: Column): Column
      
      
      
Returns an unordered array of all entries in the given map.
Returns an unordered array of all entries in the given map.
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        map_filter(expr: Column, f: (Column, Column) ⇒ Column): Column
      
      
      
Returns a map whose key-value pairs satisfy a predicate.
Returns a map whose key-value pairs satisfy a predicate.
df.select(map_filter(col("m"), (k, v) => k * 10 === v))
- expr
 the input map column
- f
 (key, value) => predicate, the Boolean predicate to filter the input map column
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        map_from_arrays(keys: Column, values: Column): Column
      
      
      
Creates a new map column.
Creates a new map column. The array in the first column is used for keys. The array in the second column is used for values. All elements in the array for key should not be null.
- Since
 2.4
 - 
      
      
      
        
      
    
      
        
        def
      
      
        map_from_entries(e: Column): Column
      
      
      
Returns a map created from the given array of entries.
Returns a map created from the given array of entries.
- Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        map_keys(e: Column): Column
      
      
      
Returns an unordered array containing the keys of the map.
Returns an unordered array containing the keys of the map.
- Since
 2.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        map_values(e: Column): Column
      
      
      
Returns an unordered array containing the values of the map.
Returns an unordered array containing the values of the map.
- Since
 2.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        map_zip_with(left: Column, right: Column, f: (Column, Column, Column) ⇒ Column): Column
      
      
      
Merge two given maps, key-wise into a single map using a function.
Merge two given maps, key-wise into a single map using a function.
df.select(map_zip_with(df("m1"), df("m2"), (k, v1, v2) => k === v1 + v2))
- left
 the left input map column
- right
 the right input map column
- f
 (key, value1, value2) => new_value, the lambda function to merge the map values
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        mask(input: Column, upperChar: Column, lowerChar: Column, digitChar: Column, otherChar: Column): Column
      
      
      
Masks the given string value.
Masks the given string value. This can be useful for creating copies of tables with sensitive information removed.
- input
 string value to mask. Supported types: STRING, VARCHAR, CHAR
- upperChar
 character to replace upper-case characters with. Specify NULL to retain original character.
- lowerChar
 character to replace lower-case characters with. Specify NULL to retain original character.
- digitChar
 character to replace digit characters with. Specify NULL to retain original character.
- otherChar
 character to replace all other characters with. Specify NULL to retain original character.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        mask(input: Column, upperChar: Column, lowerChar: Column, digitChar: Column): Column
      
      
      
Masks the given string value.
Masks the given string value. The function replaces upper-case, lower-case characters and numbers with the characters specified respectively. This can be useful for creating copies of tables with sensitive information removed.
- input
 string value to mask. Supported types: STRING, VARCHAR, CHAR
- upperChar
 character to replace upper-case characters with. Specify NULL to retain original character.
- lowerChar
 character to replace lower-case characters with. Specify NULL to retain original character.
- digitChar
 character to replace digit characters with. Specify NULL to retain original character.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        mask(input: Column, upperChar: Column, lowerChar: Column): Column
      
      
      
Masks the given string value.
Masks the given string value. The function replaces upper-case and lower-case characters with the characters specified respectively, and numbers with 'n'. This can be useful for creating copies of tables with sensitive information removed.
- input
 string value to mask. Supported types: STRING, VARCHAR, CHAR
- upperChar
 character to replace upper-case characters with. Specify NULL to retain original character.
- lowerChar
 character to replace lower-case characters with. Specify NULL to retain original character.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        mask(input: Column, upperChar: Column): Column
      
      
      
Masks the given string value.
Masks the given string value. The function replaces upper-case characters with specific character, lower-case characters with 'x', and numbers with 'n'. This can be useful for creating copies of tables with sensitive information removed.
- input
 string value to mask. Supported types: STRING, VARCHAR, CHAR
- upperChar
 character to replace upper-case characters with. Specify NULL to retain original character.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        mask(input: Column): Column
      
      
      
Masks the given string value.
Masks the given string value. The function replaces characters with 'X' or 'x', and numbers with 'n'. This can be useful for creating copies of tables with sensitive information removed.
- input
 string value to mask. Supported types: STRING, VARCHAR, CHAR
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        max(columnName: String): Column
      
      
      
Aggregate function: returns the maximum value of the column in a group.
Aggregate function: returns the maximum value of the column in a group.
- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        max(e: Column): Column
      
      
      
Aggregate function: returns the maximum value of the expression in a group.
Aggregate function: returns the maximum value of the expression in a group.
- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        max_by(e: Column, ord: Column): Column
      
      
      
Aggregate function: returns the value associated with the maximum value of ord.
Aggregate function: returns the value associated with the maximum value of ord.
- Since
 3.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        md5(e: Column): Column
      
      
      
Calculates the MD5 digest of a binary column and returns the value as a 32 character hex string.
Calculates the MD5 digest of a binary column and returns the value as a 32 character hex string.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        mean(columnName: String): Column
      
      
      
Aggregate function: returns the average of the values in a group.
Aggregate function: returns the average of the values in a group. Alias for avg.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        mean(e: Column): Column
      
      
      
Aggregate function: returns the average of the values in a group.
Aggregate function: returns the average of the values in a group. Alias for avg.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        median(e: Column): Column
      
      
      
Aggregate function: returns the median of the values in a group.
Aggregate function: returns the median of the values in a group.
- Since
 3.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        min(columnName: String): Column
      
      
      
Aggregate function: returns the minimum value of the column in a group.
Aggregate function: returns the minimum value of the column in a group.
- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        min(e: Column): Column
      
      
      
Aggregate function: returns the minimum value of the expression in a group.
Aggregate function: returns the minimum value of the expression in a group.
- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        min_by(e: Column, ord: Column): Column
      
      
      
Aggregate function: returns the value associated with the minimum value of ord.
Aggregate function: returns the value associated with the minimum value of ord.
- Since
 3.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        minute(e: Column): Column
      
      
      
Extracts the minutes as an integer from a given date/timestamp/string.
Extracts the minutes as an integer from a given date/timestamp/string.
- returns
 An integer, or null if the input was a string that could not be cast to a date
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        mode(e: Column): Column
      
      
      
Aggregate function: returns the most frequent value in a group.
Aggregate function: returns the most frequent value in a group.
- Since
 3.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        monotonically_increasing_id(): Column
      
      
      
A column expression that generates monotonically increasing 64-bit integers.
A column expression that generates monotonically increasing 64-bit integers.
The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the record number within each partition in the lower 33 bits. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records.
As an example, consider a
DataFramewith two partitions, each with 3 records. This expression would return the following IDs:0, 1, 2, 8589934592 (1L << 33), 8589934593, 8589934594.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        month(e: Column): Column
      
      
      
Extracts the month as an integer from a given date/timestamp/string.
Extracts the month as an integer from a given date/timestamp/string.
- returns
 An integer, or null if the input was a string that could not be cast to a date
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        months(e: Column): Column
      
      
      
A transform for timestamps and dates to partition data into months.
A transform for timestamps and dates to partition data into months.
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        months_between(end: Column, start: Column, roundOff: Boolean): Column
      
      
      
Returns number of months between dates
endandstart.Returns number of months between dates
endandstart. IfroundOffis set to true, the result is rounded off to 8 digits; it is not rounded otherwise.- Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        months_between(end: Column, start: Column): Column
      
      
      
Returns number of months between dates
startandend.Returns number of months between dates
startandend.A whole number is returned if both inputs have the same day of month or both are the last day of their respective months. Otherwise, the difference is calculated assuming 31 days per month.
For example:
months_between("2017-11-14", "2017-07-14") // returns 4.0 months_between("2017-01-01", "2017-01-10") // returns 0.29032258 months_between("2017-06-01", "2017-06-16 12:00:00") // returns -0.5
- end
 A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as
yyyy-MM-ddoryyyy-MM-dd HH:mm:ss.SSSS- start
 A date, timestamp or string. If a string, the data must be in a format that can cast to a timestamp, such as
yyyy-MM-ddoryyyy-MM-dd HH:mm:ss.SSSS- returns
 A double, or null if either
endorstartwere strings that could not be cast to a timestamp. Negative ifendis beforestart
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        named_struct(cols: Column*): Column
      
      
      
Creates a struct with the given field names and values.
Creates a struct with the given field names and values.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        nanvl(col1: Column, col2: Column): Column
      
      
      
Returns col1 if it is not NaN, or col2 if col1 is NaN.
Returns col1 if it is not NaN, or col2 if col1 is NaN.
Both inputs should be floating point columns (DoubleType or FloatType).
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        ne(arg0: AnyRef): Boolean
      
      
      
- Definition Classes
 - AnyRef
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        negate(e: Column): Column
      
      
      
Unary minus, i.e.
Unary minus, i.e. negate the expression.
// Select the amount column and negates all values. // Scala: df.select( -df("amount") ) // Java: df.select( negate(df.col("amount")) );
- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        negative(e: Column): Column
      
      
      
Returns the negated value.
Returns the negated value.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        next_day(date: Column, dayOfWeek: Column): Column
      
      
      
Returns the first date which is later than the value of the
datecolumn that is on the specified day of the week.Returns the first date which is later than the value of the
datecolumn that is on the specified day of the week.For example,
next_day('2015-07-27', "Sunday")returns 2015-08-02 because that is the first Sunday after 2015-07-27.- date
 A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as
yyyy-MM-ddoryyyy-MM-dd HH:mm:ss.SSSS- dayOfWeek
 A column of the day of week. Case insensitive, and accepts: "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"
- returns
 A date, or null if
datewas a string that could not be cast to a date or ifdayOfWeekwas an invalid value
- Since
 3.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        next_day(date: Column, dayOfWeek: String): Column
      
      
      
Returns the first date which is later than the value of the
datecolumn that is on the specified day of the week.Returns the first date which is later than the value of the
datecolumn that is on the specified day of the week.For example,
next_day('2015-07-27', "Sunday")returns 2015-08-02 because that is the first Sunday after 2015-07-27.- date
 A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as
yyyy-MM-ddoryyyy-MM-dd HH:mm:ss.SSSS- dayOfWeek
 Case insensitive, and accepts: "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"
- returns
 A date, or null if
datewas a string that could not be cast to a date or ifdayOfWeekwas an invalid value
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        not(e: Column): Column
      
      
      
Inversion of boolean expression, i.e.
Inversion of boolean expression, i.e. NOT.
// Scala: select rows that are not active (isActive === false) df.filter( !df("isActive") ) // Java: df.filter( not(df.col("isActive")) );
- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        notify(): Unit
      
      
      
- Definition Classes
 - AnyRef
 - Annotations
 - @native()
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        notifyAll(): Unit
      
      
      
- Definition Classes
 - AnyRef
 - Annotations
 - @native()
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        now(): Column
      
      
      
Returns the current timestamp at the start of query evaluation.
Returns the current timestamp at the start of query evaluation.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        nth_value(e: Column, offset: Int): Column
      
      
      
Window function: returns the value that is the
offsetth row of the window frame (counting from 1), andnullif the size of window frame is less thanoffsetrows.Window function: returns the value that is the
offsetth row of the window frame (counting from 1), andnullif the size of window frame is less thanoffsetrows.This is equivalent to the nth_value function in SQL.
- Since
 3.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        nth_value(e: Column, offset: Int, ignoreNulls: Boolean): Column
      
      
      
Window function: returns the value that is the
offsetth row of the window frame (counting from 1), andnullif the size of window frame is less thanoffsetrows.Window function: returns the value that is the
offsetth row of the window frame (counting from 1), andnullif the size of window frame is less thanoffsetrows.It will return the
offsetth non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.This is equivalent to the nth_value function in SQL.
- Since
 3.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        ntile(n: Int): Column
      
      
      
Window function: returns the ntile group id (from 1 to
ninclusive) in an ordered window partition.Window function: returns the ntile group id (from 1 to
ninclusive) in an ordered window partition. For example, ifnis 4, the first quarter of the rows will get value 1, the second quarter will get 2, the third quarter will get 3, and the last quarter will get 4.This is equivalent to the NTILE function in SQL.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        nullif(col1: Column, col2: Column): Column
      
      
      
Returns null if
col1equals tocol2, orcol1otherwise.Returns null if
col1equals tocol2, orcol1otherwise.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        nvl(col1: Column, col2: Column): Column
      
      
      
Returns
col2ifcol1is null, orcol1otherwise.Returns
col2ifcol1is null, orcol1otherwise.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        nvl2(col1: Column, col2: Column, col3: Column): Column
      
      
      
Returns
col2ifcol1is not null, orcol3otherwise.Returns
col2ifcol1is not null, orcol3otherwise.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        octet_length(e: Column): Column
      
      
      
Calculates the byte length for the specified string column.
Calculates the byte length for the specified string column.
- Since
 3.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        overlay(src: Column, replace: Column, pos: Column): Column
      
      
      
Overlay the specified portion of
srcwithreplace, starting from byte positionposofsrc.Overlay the specified portion of
srcwithreplace, starting from byte positionposofsrc.- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        overlay(src: Column, replace: Column, pos: Column, len: Column): Column
      
      
      
Overlay the specified portion of
srcwithreplace, starting from byte positionposofsrcand proceeding forlenbytes.Overlay the specified portion of
srcwithreplace, starting from byte positionposofsrcand proceeding forlenbytes.- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        parse_url(url: Column, partToExtract: Column): Column
      
      
      
Extracts a part from a URL.
Extracts a part from a URL.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        parse_url(url: Column, partToExtract: Column, key: Column): Column
      
      
      
Extracts a part from a URL.
Extracts a part from a URL.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        percent_rank(): Column
      
      
      
Window function: returns the relative rank (i.e.
Window function: returns the relative rank (i.e. percentile) of rows within a window partition.
This is computed by:
(rank of row in its partition - 1) / (number of rows in the partition - 1)
This is equivalent to the PERCENT_RANK function in SQL.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        percentile(e: Column, percentage: Column, frequency: Column): Column
      
      
      
Aggregate function: returns the exact percentile(s) of numeric column
exprat the given percentage(s) with value range in [0.0, 1.0].Aggregate function: returns the exact percentile(s) of numeric column
exprat the given percentage(s) with value range in [0.0, 1.0].- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        percentile(e: Column, percentage: Column): Column
      
      
      
Aggregate function: returns the exact percentile(s) of numeric column
exprat the given percentage(s) with value range in [0.0, 1.0].Aggregate function: returns the exact percentile(s) of numeric column
exprat the given percentage(s) with value range in [0.0, 1.0].- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        percentile_approx(e: Column, percentage: Column, accuracy: Column): Column
      
      
      
Aggregate function: returns the approximate
percentileof the numeric columncolwhich is the smallest value in the orderedcolvalues (sorted from least to greatest) such that no more thanpercentageofcolvalues is less than the value or equal to that value.Aggregate function: returns the approximate
percentileof the numeric columncolwhich is the smallest value in the orderedcolvalues (sorted from least to greatest) such that no more thanpercentageofcolvalues is less than the value or equal to that value.If percentage is an array, each value must be between 0.0 and 1.0. If it is a single floating point value, it must be between 0.0 and 1.0.
The accuracy parameter is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of accuracy yields better accuracy, 1.0/accuracy is the relative error of the approximation.
- Since
 3.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        pi(): Column
      
      
      
Returns Pi.
Returns Pi.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        pmod(dividend: Column, divisor: Column): Column
      
      
      
Returns the positive value of dividend mod divisor.
Returns the positive value of dividend mod divisor.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        posexplode(e: Column): Column
      
      
      
Creates a new row for each element with position in the given array or map column.
Creates a new row for each element with position in the given array or map column. Uses the default column name
posfor position, andcolfor elements in the array andkeyandvaluefor elements in the map unless specified otherwise.- Since
 2.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        posexplode_outer(e: Column): Column
      
      
      
Creates a new row for each element with position in the given array or map column.
Creates a new row for each element with position in the given array or map column. Uses the default column name
posfor position, andcolfor elements in the array andkeyandvaluefor elements in the map unless specified otherwise. Unlike posexplode, if the array/map is null or empty then the row (null, null) is produced.- Since
 2.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        position(substr: Column, str: Column): Column
      
      
      
Returns the position of the first occurrence of
substrinstrafter position1.Returns the position of the first occurrence of
substrinstrafter position1. The return value are 1-based.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        position(substr: Column, str: Column, start: Column): Column
      
      
      
Returns the position of the first occurrence of
substrinstrafter positionstart.Returns the position of the first occurrence of
substrinstrafter positionstart. The givenstartand return value are 1-based.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        positive(e: Column): Column
      
      
      
Returns the value.
Returns the value.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        pow(l: Double, rightName: String): Column
      
      
      
Returns the value of the first argument raised to the power of the second argument.
Returns the value of the first argument raised to the power of the second argument.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        pow(l: Double, r: Column): Column
      
      
      
Returns the value of the first argument raised to the power of the second argument.
Returns the value of the first argument raised to the power of the second argument.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        pow(leftName: String, r: Double): Column
      
      
      
Returns the value of the first argument raised to the power of the second argument.
Returns the value of the first argument raised to the power of the second argument.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        pow(l: Column, r: Double): Column
      
      
      
Returns the value of the first argument raised to the power of the second argument.
Returns the value of the first argument raised to the power of the second argument.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        pow(leftName: String, rightName: String): Column
      
      
      
Returns the value of the first argument raised to the power of the second argument.
Returns the value of the first argument raised to the power of the second argument.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        pow(leftName: String, r: Column): Column
      
      
      
Returns the value of the first argument raised to the power of the second argument.
Returns the value of the first argument raised to the power of the second argument.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        pow(l: Column, rightName: String): Column
      
      
      
Returns the value of the first argument raised to the power of the second argument.
Returns the value of the first argument raised to the power of the second argument.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        pow(l: Column, r: Column): Column
      
      
      
Returns the value of the first argument raised to the power of the second argument.
Returns the value of the first argument raised to the power of the second argument.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        power(l: Column, r: Column): Column
      
      
      
Returns the value of the first argument raised to the power of the second argument.
Returns the value of the first argument raised to the power of the second argument.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        printf(format: Column, arguments: Column*): Column
      
      
      
Formats the arguments in printf-style and returns the result as a string column.
Formats the arguments in printf-style and returns the result as a string column.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        product(e: Column): Column
      
      
      
Aggregate function: returns the product of all numerical elements in a group.
Aggregate function: returns the product of all numerical elements in a group.
- Since
 3.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        quarter(e: Column): Column
      
      
      
Extracts the quarter as an integer from a given date/timestamp/string.
Extracts the quarter as an integer from a given date/timestamp/string.
- returns
 An integer, or null if the input was a string that could not be cast to a date
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        radians(columnName: String): Column
      
      
      
Converts an angle measured in degrees to an approximately equivalent angle measured in radians.
Converts an angle measured in degrees to an approximately equivalent angle measured in radians.
- columnName
 angle in degrees
- returns
 angle in radians, as if computed by
java.lang.Math.toRadians
- Since
 2.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        radians(e: Column): Column
      
      
      
Converts an angle measured in degrees to an approximately equivalent angle measured in radians.
Converts an angle measured in degrees to an approximately equivalent angle measured in radians.
- e
 angle in degrees
- returns
 angle in radians, as if computed by
java.lang.Math.toRadians
- Since
 2.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        raise_error(c: Column): Column
      
      
      
Throws an exception with the provided error message.
Throws an exception with the provided error message.
- Since
 3.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        rand(): Column
      
      
      
Generate a random column with independent and identically distributed (i.i.d.) samples uniformly distributed in [0.0, 1.0).
Generate a random column with independent and identically distributed (i.i.d.) samples uniformly distributed in [0.0, 1.0).
- Since
 1.4.0
- Note
 The function is non-deterministic in general case.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        rand(seed: Long): Column
      
      
      
Generate a random column with independent and identically distributed (i.i.d.) samples uniformly distributed in [0.0, 1.0).
Generate a random column with independent and identically distributed (i.i.d.) samples uniformly distributed in [0.0, 1.0).
- Since
 1.4.0
- Note
 The function is non-deterministic in general case.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        randn(): Column
      
      
      
Generate a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.
Generate a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.
- Since
 1.4.0
- Note
 The function is non-deterministic in general case.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        randn(seed: Long): Column
      
      
      
Generate a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.
Generate a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.
- Since
 1.4.0
- Note
 The function is non-deterministic in general case.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        random(): Column
      
      
      
Returns a random value with independent and identically distributed (i.i.d.) uniformly distributed values in [0, 1).
Returns a random value with independent and identically distributed (i.i.d.) uniformly distributed values in [0, 1).
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        random(seed: Column): Column
      
      
      
Returns a random value with independent and identically distributed (i.i.d.) uniformly distributed values in [0, 1).
Returns a random value with independent and identically distributed (i.i.d.) uniformly distributed values in [0, 1).
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        rank(): Column
      
      
      
Window function: returns the rank of rows within a window partition.
Window function: returns the rank of rows within a window partition.
The difference between rank and dense_rank is that dense_rank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using dense_rank and had three people tie for second place, you would say that all three were in second place and that the next person came in third. Rank would give me sequential numbers, making the person that came in third place (after the ties) would register as coming in fifth.
This is equivalent to the RANK function in SQL.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        reduce(expr: Column, initialValue: Column, merge: (Column, Column) ⇒ Column): Column
      
      
      
Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state.
Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state.
df.select(aggregate(col("i"), lit(0), (acc, x) => acc + x))
- expr
 the input array column
- initialValue
 the initial value
- merge
 (combined_value, input_value) => combined_value, the merge function to merge an input value to the combined_value
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        reduce(expr: Column, initialValue: Column, merge: (Column, Column) ⇒ Column, finish: (Column) ⇒ Column): Column
      
      
      
Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state.
Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function.
df.select(aggregate(col("i"), lit(0), (acc, x) => acc + x, _ * 10))
- expr
 the input array column
- initialValue
 the initial value
- merge
 (combined_value, input_value) => combined_value, the merge function to merge an input value to the combined_value
- finish
 combined_value => final_value, the lambda function to convert the combined value of all inputs to final result
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        reflect(cols: Column*): Column
      
      
      
Calls a method with reflection.
Calls a method with reflection.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        regexp(str: Column, regexp: Column): Column
      
      
      
Returns true if
strmatchesregexp, or false otherwise.Returns true if
strmatchesregexp, or false otherwise.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        regexp_count(str: Column, regexp: Column): Column
      
      
      
Returns a count of the number of times that the regular expression pattern
regexpis matched in the stringstr.Returns a count of the number of times that the regular expression pattern
regexpis matched in the stringstr.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        regexp_extract(e: Column, exp: String, groupIdx: Int): Column
      
      
      
Extract a specific group matched by a Java regex, from the specified string column.
Extract a specific group matched by a Java regex, from the specified string column. If the regex did not match, or the specified group did not match, an empty string is returned. if the specified group index exceeds the group count of regex, an IllegalArgumentException will be thrown.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        regexp_extract_all(str: Column, regexp: Column, idx: Column): Column
      
      
      
Extract all strings in the
strthat match theregexpexpression and corresponding to the regex group index.Extract all strings in the
strthat match theregexpexpression and corresponding to the regex group index.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        regexp_extract_all(str: Column, regexp: Column): Column
      
      
      
Extract all strings in the
strthat match theregexpexpression and corresponding to the first regex group index.Extract all strings in the
strthat match theregexpexpression and corresponding to the first regex group index.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        regexp_instr(str: Column, regexp: Column, idx: Column): Column
      
      
      
Searches a string for a regular expression and returns an integer that indicates the beginning position of the matched substring.
Searches a string for a regular expression and returns an integer that indicates the beginning position of the matched substring. Positions are 1-based, not 0-based. If no match is found, returns 0.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        regexp_instr(str: Column, regexp: Column): Column
      
      
      
Searches a string for a regular expression and returns an integer that indicates the beginning position of the matched substring.
Searches a string for a regular expression and returns an integer that indicates the beginning position of the matched substring. Positions are 1-based, not 0-based. If no match is found, returns 0.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        regexp_like(str: Column, regexp: Column): Column
      
      
      
Returns true if
strmatchesregexp, or false otherwise.Returns true if
strmatchesregexp, or false otherwise.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        regexp_replace(e: Column, pattern: Column, replacement: Column): Column
      
      
      
Replace all substrings of the specified string value that match regexp with rep.
Replace all substrings of the specified string value that match regexp with rep.
- Since
 2.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        regexp_replace(e: Column, pattern: String, replacement: String): Column
      
      
      
Replace all substrings of the specified string value that match regexp with rep.
Replace all substrings of the specified string value that match regexp with rep.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        regexp_substr(str: Column, regexp: Column): Column
      
      
      
Returns the substring that matches the regular expression
regexpwithin the stringstr.Returns the substring that matches the regular expression
regexpwithin the stringstr. If the regular expression is not found, the result is null.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        regr_avgx(y: Column, x: Column): Column
      
      
      
Aggregate function: returns the average of the independent variable for non-null pairs in a group, where
yis the dependent variable andxis the independent variable.Aggregate function: returns the average of the independent variable for non-null pairs in a group, where
yis the dependent variable andxis the independent variable.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        regr_avgy(y: Column, x: Column): Column
      
      
      
Aggregate function: returns the average of the independent variable for non-null pairs in a group, where
yis the dependent variable andxis the independent variable.Aggregate function: returns the average of the independent variable for non-null pairs in a group, where
yis the dependent variable andxis the independent variable.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        regr_count(y: Column, x: Column): Column
      
      
      
Aggregate function: returns the number of non-null number pairs in a group, where
yis the dependent variable andxis the independent variable.Aggregate function: returns the number of non-null number pairs in a group, where
yis the dependent variable andxis the independent variable.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        regr_intercept(y: Column, x: Column): Column
      
      
      
Aggregate function: returns the intercept of the univariate linear regression line for non-null pairs in a group, where
yis the dependent variable andxis the independent variable.Aggregate function: returns the intercept of the univariate linear regression line for non-null pairs in a group, where
yis the dependent variable andxis the independent variable.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        regr_r2(y: Column, x: Column): Column
      
      
      
Aggregate function: returns the coefficient of determination for non-null pairs in a group, where
yis the dependent variable andxis the independent variable.Aggregate function: returns the coefficient of determination for non-null pairs in a group, where
yis the dependent variable andxis the independent variable.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        regr_slope(y: Column, x: Column): Column
      
      
      
Aggregate function: returns the slope of the linear regression line for non-null pairs in a group, where
yis the dependent variable andxis the independent variable.Aggregate function: returns the slope of the linear regression line for non-null pairs in a group, where
yis the dependent variable andxis the independent variable.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        regr_sxx(y: Column, x: Column): Column
      
      
      
Aggregate function: returns REGR_COUNT(y, x) * VAR_POP(x) for non-null pairs in a group, where
yis the dependent variable andxis the independent variable.Aggregate function: returns REGR_COUNT(y, x) * VAR_POP(x) for non-null pairs in a group, where
yis the dependent variable andxis the independent variable.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        regr_sxy(y: Column, x: Column): Column
      
      
      
Aggregate function: returns REGR_COUNT(y, x) * COVAR_POP(y, x) for non-null pairs in a group, where
yis the dependent variable andxis the independent variable.Aggregate function: returns REGR_COUNT(y, x) * COVAR_POP(y, x) for non-null pairs in a group, where
yis the dependent variable andxis the independent variable.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        regr_syy(y: Column, x: Column): Column
      
      
      
Aggregate function: returns REGR_COUNT(y, x) * VAR_POP(y) for non-null pairs in a group, where
yis the dependent variable andxis the independent variable.Aggregate function: returns REGR_COUNT(y, x) * VAR_POP(y) for non-null pairs in a group, where
yis the dependent variable andxis the independent variable.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        repeat(str: Column, n: Int): Column
      
      
      
Repeats a string column n times, and returns it as a new string column.
Repeats a string column n times, and returns it as a new string column.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        replace(src: Column, search: Column): Column
      
      
      
Replaces all occurrences of
searchwithreplace.Replaces all occurrences of
searchwithreplace.- src
 A column of string to be replaced
- search
 A column of string, If
searchis not found insrc,srcis returned unchanged.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        replace(src: Column, search: Column, replace: Column): Column
      
      
      
Replaces all occurrences of
searchwithreplace.Replaces all occurrences of
searchwithreplace.- src
 A column of string to be replaced
- search
 A column of string, If
searchis not found instr,stris returned unchanged.- replace
 A column of string, If
replaceis not specified or is an empty string, nothing replaces the string that is removed fromstr.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        reverse(e: Column): Column
      
      
      
Returns a reversed string or an array with reverse order of elements.
Returns a reversed string or an array with reverse order of elements.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        right(str: Column, len: Column): Column
      
      
      
Returns the rightmost
len(lencan be string type) characters from the stringstr, iflenis less or equal than 0 the result is an empty string.Returns the rightmost
len(lencan be string type) characters from the stringstr, iflenis less or equal than 0 the result is an empty string.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        rint(columnName: String): Column
      
      
      
Returns the double value that is closest in value to the argument and is equal to a mathematical integer.
Returns the double value that is closest in value to the argument and is equal to a mathematical integer.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        rint(e: Column): Column
      
      
      
Returns the double value that is closest in value to the argument and is equal to a mathematical integer.
Returns the double value that is closest in value to the argument and is equal to a mathematical integer.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        rlike(str: Column, regexp: Column): Column
      
      
      
Returns true if
strmatchesregexp, or false otherwise.Returns true if
strmatchesregexp, or false otherwise.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        round(e: Column, scale: Int): Column
      
      
      
Round the value of
etoscaledecimal places with HALF_UP round mode ifscaleis greater than or equal to 0 or at integral part whenscaleis less than 0.Round the value of
etoscaledecimal places with HALF_UP round mode ifscaleis greater than or equal to 0 or at integral part whenscaleis less than 0.- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        round(e: Column): Column
      
      
      
Returns the value of the column
erounded to 0 decimal places with HALF_UP round mode.Returns the value of the column
erounded to 0 decimal places with HALF_UP round mode.- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        row_number(): Column
      
      
      
Window function: returns a sequential number starting at 1 within a window partition.
Window function: returns a sequential number starting at 1 within a window partition.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        rpad(str: Column, len: Int, pad: Array[Byte]): Column
      
      
      
Right-pad the binary column with pad to a byte length of len.
Right-pad the binary column with pad to a byte length of len. If the binary column is longer than len, the return value is shortened to len bytes.
- Since
 3.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        rpad(str: Column, len: Int, pad: String): Column
      
      
      
Right-pad the string column with pad to a length of len.
Right-pad the string column with pad to a length of len. If the string column is longer than len, the return value is shortened to len characters.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        rtrim(e: Column, trimString: String): Column
      
      
      
Trim the specified character string from right end for the specified string column.
Trim the specified character string from right end for the specified string column.
- Since
 2.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        rtrim(e: Column): Column
      
      
      
Trim the spaces from right end for the specified string value.
Trim the spaces from right end for the specified string value.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        schema_of_csv(csv: Column, options: Map[String, String]): Column
      
      
      
Parses a CSV string and infers its schema in DDL format using options.
Parses a CSV string and infers its schema in DDL format using options.
- csv
 a foldable string column containing a CSV string.
- options
 options to control how the CSV is parsed. accepts the same options and the CSV data source. See Data Source Option in the version you use.
- returns
 a column with string literal containing schema in DDL format.
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        schema_of_csv(csv: Column): Column
      
      
      
Parses a CSV string and infers its schema in DDL format.
Parses a CSV string and infers its schema in DDL format.
- csv
 a foldable string column containing a CSV string.
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        schema_of_csv(csv: String): Column
      
      
      
Parses a CSV string and infers its schema in DDL format.
Parses a CSV string and infers its schema in DDL format.
- csv
 a CSV string.
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        schema_of_json(json: Column, options: Map[String, String]): Column
      
      
      
Parses a JSON string and infers its schema in DDL format using options.
Parses a JSON string and infers its schema in DDL format using options.
- json
 a foldable string column containing JSON data.
- options
 options to control how the json is parsed. accepts the same options and the json data source. See Data Source Option in the version you use.
- returns
 a column with string literal containing schema in DDL format.
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        schema_of_json(json: Column): Column
      
      
      
Parses a JSON string and infers its schema in DDL format.
Parses a JSON string and infers its schema in DDL format.
- json
 a foldable string column containing a JSON string.
- Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        schema_of_json(json: String): Column
      
      
      
Parses a JSON string and infers its schema in DDL format.
Parses a JSON string and infers its schema in DDL format.
- json
 a JSON string.
- Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        sec(e: Column): Column
      
      
      
- e
 angle in radians
- returns
 secant of the angle
- Since
 3.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        second(e: Column): Column
      
      
      
Extracts the seconds as an integer from a given date/timestamp/string.
Extracts the seconds as an integer from a given date/timestamp/string.
- returns
 An integer, or null if the input was a string that could not be cast to a timestamp
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        sentences(string: Column): Column
      
      
      
Splits a string into arrays of sentences, where each sentence is an array of words.
Splits a string into arrays of sentences, where each sentence is an array of words. The default locale is used.
- Since
 3.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        sentences(string: Column, language: Column, country: Column): Column
      
      
      
Splits a string into arrays of sentences, where each sentence is an array of words.
Splits a string into arrays of sentences, where each sentence is an array of words.
- Since
 3.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        sequence(start: Column, stop: Column): Column
      
      
      
Generate a sequence of integers from start to stop, incrementing by 1 if start is less than or equal to stop, otherwise -1.
Generate a sequence of integers from start to stop, incrementing by 1 if start is less than or equal to stop, otherwise -1.
- Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        sequence(start: Column, stop: Column, step: Column): Column
      
      
      
Generate a sequence of integers from start to stop, incrementing by step.
Generate a sequence of integers from start to stop, incrementing by step.
- Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        session_window(timeColumn: Column, gapDuration: Column): Column
      
      
      
Generates session window given a timestamp specifying column.
Generates session window given a timestamp specifying column.
Session window is one of dynamic windows, which means the length of window is varying according to the given inputs. For static gap duration, the length of session window is defined as "the timestamp of latest input of the session + gap duration", so when the new inputs are bound to the current session window, the end time of session window can be expanded according to the new inputs.
Besides a static gap duration value, users can also provide an expression to specify gap duration dynamically based on the input row. With dynamic gap duration, the closing of a session window does not depend on the latest input anymore. A session window's range is the union of all events' ranges which are determined by event start time and evaluated gap duration during the query execution. Note that the rows with negative or zero gap duration will be filtered out from the aggregation.
Windows can support microsecond precision. gapDuration in the order of months are not supported.
For a streaming query, you may use the function
current_timestampto generate windows on processing time.- timeColumn
 The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType or TimestampNTZType.
- gapDuration
 A column specifying the timeout of the session. It could be static value, e.g.
10 minutes,1 second, or an expression/UDF that specifies gap duration dynamically based on the input row.
- Since
 3.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        session_window(timeColumn: Column, gapDuration: String): Column
      
      
      
Generates session window given a timestamp specifying column.
Generates session window given a timestamp specifying column.
Session window is one of dynamic windows, which means the length of window is varying according to the given inputs. The length of session window is defined as "the timestamp of latest input of the session + gap duration", so when the new inputs are bound to the current session window, the end time of session window can be expanded according to the new inputs.
Windows can support microsecond precision. gapDuration in the order of months are not supported.
For a streaming query, you may use the function
current_timestampto generate windows on processing time.- timeColumn
 The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType or TimestampNTZType.
- gapDuration
 A string specifying the timeout of the session, e.g.
10 minutes,1 second. Checkorg.apache.spark.unsafe.types.CalendarIntervalfor valid duration identifiers.
- Since
 3.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        sha(col: Column): Column
      
      
      
Returns a sha1 hash value as a hex string of the
col.Returns a sha1 hash value as a hex string of the
col.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        sha1(e: Column): Column
      
      
      
Calculates the SHA-1 digest of a binary column and returns the value as a 40 character hex string.
Calculates the SHA-1 digest of a binary column and returns the value as a 40 character hex string.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        sha2(e: Column, numBits: Int): Column
      
      
      
Calculates the SHA-2 family of hash functions of a binary column and returns the value as a hex string.
Calculates the SHA-2 family of hash functions of a binary column and returns the value as a hex string.
- e
 column to compute SHA-2 on.
- numBits
 one of 224, 256, 384, or 512.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        shiftleft(e: Column, numBits: Int): Column
      
      
      
Shift the given value numBits left.
Shift the given value numBits left. If the given value is a long value, this function will return a long value else it will return an integer value.
- Since
 3.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        shiftright(e: Column, numBits: Int): Column
      
      
      
(Signed) shift the given value numBits right.
(Signed) shift the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value.
- Since
 3.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        shiftrightunsigned(e: Column, numBits: Int): Column
      
      
      
Unsigned shift the given value numBits right.
Unsigned shift the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value.
- Since
 3.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        shuffle(e: Column): Column
      
      
      
Returns a random permutation of the given array.
Returns a random permutation of the given array.
- Since
 2.4.0
- Note
 The function is non-deterministic.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        sign(e: Column): Column
      
      
      
Computes the signum of the given value.
Computes the signum of the given value.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        signum(columnName: String): Column
      
      
      
Computes the signum of the given column.
Computes the signum of the given column.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        signum(e: Column): Column
      
      
      
Computes the signum of the given value.
Computes the signum of the given value.
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        sin(columnName: String): Column
      
      
      
- columnName
 angle in radians
- returns
 sine of the angle, as if computed by
java.lang.Math.sin
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        sin(e: Column): Column
      
      
      
- e
 angle in radians
- returns
 sine of the angle, as if computed by
java.lang.Math.sin
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        sinh(columnName: String): Column
      
      
      
- columnName
 hyperbolic angle
- returns
 hyperbolic sine of the given value, as if computed by
java.lang.Math.sinh
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        sinh(e: Column): Column
      
      
      
- e
 hyperbolic angle
- returns
 hyperbolic sine of the given value, as if computed by
java.lang.Math.sinh
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        size(e: Column): Column
      
      
      
Returns length of array or map.
Returns length of array or map.
The function returns null for null input if spark.sql.legacy.sizeOfNull is set to false or spark.sql.ansi.enabled is set to true. Otherwise, the function returns -1 for null input. With the default settings, the function returns -1 for null input.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        skewness(columnName: String): Column
      
      
      
Aggregate function: returns the skewness of the values in a group.
Aggregate function: returns the skewness of the values in a group.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        skewness(e: Column): Column
      
      
      
Aggregate function: returns the skewness of the values in a group.
Aggregate function: returns the skewness of the values in a group.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        slice(x: Column, start: Column, length: Column): Column
      
      
      
Returns an array containing all the elements in
xfrom indexstart(or starting from the end ifstartis negative) with the specifiedlength.Returns an array containing all the elements in
xfrom indexstart(or starting from the end ifstartis negative) with the specifiedlength.- x
 the array column to be sliced
- start
 the starting index
- length
 the length of the slice
- Since
 3.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        slice(x: Column, start: Int, length: Int): Column
      
      
      
Returns an array containing all the elements in
xfrom indexstart(or starting from the end ifstartis negative) with the specifiedlength.Returns an array containing all the elements in
xfrom indexstart(or starting from the end ifstartis negative) with the specifiedlength.- x
 the array column to be sliced
- start
 the starting index
- length
 the length of the slice
- Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        some(e: Column): Column
      
      
      
Aggregate function: returns true if at least one value of
eis true.Aggregate function: returns true if at least one value of
eis true.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        sort_array(e: Column, asc: Boolean): Column
      
      
      
Sorts the input array for the given column in ascending or descending order, according to the natural ordering of the array elements.
Sorts the input array for the given column in ascending or descending order, according to the natural ordering of the array elements. NaN is greater than any non-NaN elements for double/float type. Null elements will be placed at the beginning of the returned array in ascending order or at the end of the returned array in descending order.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        sort_array(e: Column): Column
      
      
      
Sorts the input array for the given column in ascending order, according to the natural ordering of the array elements.
Sorts the input array for the given column in ascending order, according to the natural ordering of the array elements. Null elements will be placed at the beginning of the returned array.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        soundex(e: Column): Column
      
      
      
Returns the soundex code for the specified expression.
Returns the soundex code for the specified expression.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        spark_partition_id(): Column
      
      
      
Partition ID.
Partition ID.
- Since
 1.6.0
- Note
 This is non-deterministic because it depends on data partitioning and task scheduling.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        split(str: Column, pattern: String, limit: Int): Column
      
      
      
Splits str around matches of the given pattern.
Splits str around matches of the given pattern.
- str
 a string expression to split
- pattern
 a string representing a regular expression. The regex string should be a Java regular expression.
- limit
 an integer expression which controls the number of times the regex is applied.
- limit greater than 0: The resulting array's length will not be more than limit, and the resulting array's last entry will contain all input beyond the last matched regex.
 - limit less than or equal to 0: 
regexwill be applied as many times as possible, and the resulting array can be of any size. 
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        split(str: Column, pattern: String): Column
      
      
      
Splits str around matches of the given pattern.
Splits str around matches of the given pattern.
- str
 a string expression to split
- pattern
 a string representing a regular expression. The regex string should be a Java regular expression.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        split_part(str: Column, delimiter: Column, partNum: Column): Column
      
      
      
Splits
strby delimiter and return requested part of the split (1-based).Splits
strby delimiter and return requested part of the split (1-based). If any input is null, returns null. ifpartNumis out of range of split parts, returns empty string. IfpartNumis 0, throws an error. IfpartNumis negative, the parts are counted backward from the end of the string. If thedelimiteris an empty string, thestris not split.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        sqrt(colName: String): Column
      
      
      
Computes the square root of the specified float value.
Computes the square root of the specified float value.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        sqrt(e: Column): Column
      
      
      
Computes the square root of the specified float value.
Computes the square root of the specified float value.
- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        stack(cols: Column*): Column
      
      
      
Separates
col1, ...,colkintonrows.Separates
col1, ...,colkintonrows. Uses column names col0, col1, etc. by default unless specified otherwise.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        startswith(str: Column, prefix: Column): Column
      
      
      
Returns a boolean.
Returns a boolean. The value is True if str starts with prefix. Returns NULL if either input expression is NULL. Otherwise, returns False. Both str or prefix must be of STRING or BINARY type.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        std(e: Column): Column
      
      
      
Aggregate function: alias for
stddev_samp.Aggregate function: alias for
stddev_samp.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        stddev(columnName: String): Column
      
      
      
Aggregate function: alias for
stddev_samp.Aggregate function: alias for
stddev_samp.- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        stddev(e: Column): Column
      
      
      
Aggregate function: alias for
stddev_samp.Aggregate function: alias for
stddev_samp.- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        stddev_pop(columnName: String): Column
      
      
      
Aggregate function: returns the population standard deviation of the expression in a group.
Aggregate function: returns the population standard deviation of the expression in a group.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        stddev_pop(e: Column): Column
      
      
      
Aggregate function: returns the population standard deviation of the expression in a group.
Aggregate function: returns the population standard deviation of the expression in a group.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        stddev_samp(columnName: String): Column
      
      
      
Aggregate function: returns the sample standard deviation of the expression in a group.
Aggregate function: returns the sample standard deviation of the expression in a group.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        stddev_samp(e: Column): Column
      
      
      
Aggregate function: returns the sample standard deviation of the expression in a group.
Aggregate function: returns the sample standard deviation of the expression in a group.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        str_to_map(text: Column): Column
      
      
      
Creates a map after splitting the text into key/value pairs using delimiters.
Creates a map after splitting the text into key/value pairs using delimiters.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        str_to_map(text: Column, pairDelim: Column): Column
      
      
      
Creates a map after splitting the text into key/value pairs using delimiters.
Creates a map after splitting the text into key/value pairs using delimiters. The
pairDelimis treated as regular expressions.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        str_to_map(text: Column, pairDelim: Column, keyValueDelim: Column): Column
      
      
      
Creates a map after splitting the text into key/value pairs using delimiters.
Creates a map after splitting the text into key/value pairs using delimiters. Both
pairDelimandkeyValueDelimare treated as regular expressions.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        struct(colName: String, colNames: String*): Column
      
      
      
Creates a new struct column that composes multiple input columns.
Creates a new struct column that composes multiple input columns.
- Annotations
 - @varargs()
 - Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        struct(cols: Column*): Column
      
      
      
Creates a new struct column.
Creates a new struct column. If the input column is a column in a
DataFrame, or a derived column expression that is named (i.e. aliased), its name would be retained as the StructField's name, otherwise, the newly generated StructField's name would be auto generated ascolwith a suffixindex + 1, i.e. col1, col2, col3, ...- Annotations
 - @varargs()
 - Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        substr(str: Column, pos: Column): Column
      
      
      
Returns the substring of
strthat starts atpos, or the slice of byte array that starts atpos.Returns the substring of
strthat starts atpos, or the slice of byte array that starts atpos.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        substr(str: Column, pos: Column, len: Column): Column
      
      
      
Returns the substring of
strthat starts atposand is of lengthlen, or the slice of byte array that starts atposand is of lengthlen.Returns the substring of
strthat starts atposand is of lengthlen, or the slice of byte array that starts atposand is of lengthlen.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        substring(str: Column, pos: Int, len: Int): Column
      
      
      
Substring starts at
posand is of lengthlenwhen str is String type or returns the slice of byte array that starts atposin byte and is of lengthlenwhen str is Binary typeSubstring starts at
posand is of lengthlenwhen str is String type or returns the slice of byte array that starts atposin byte and is of lengthlenwhen str is Binary type- Since
 1.5.0
- Note
 The position is not zero based, but 1 based index.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        substring_index(str: Column, delim: String, count: Int): Column
      
      
      
Returns the substring from string str before count occurrences of the delimiter delim.
Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything the left of the final delimiter (counting from left) is returned. If count is negative, every to the right of the final delimiter (counting from the right) is returned. substring_index performs a case-sensitive match when searching for delim.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        sum(columnName: String): Column
      
      
      
Aggregate function: returns the sum of all values in the given column.
Aggregate function: returns the sum of all values in the given column.
- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        sum(e: Column): Column
      
      
      
Aggregate function: returns the sum of all values in the expression.
Aggregate function: returns the sum of all values in the expression.
- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        sum_distinct(e: Column): Column
      
      
      
Aggregate function: returns the sum of distinct values in the expression.
Aggregate function: returns the sum of distinct values in the expression.
- Since
 3.2.0
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        synchronized[T0](arg0: ⇒ T0): T0
      
      
      
- Definition Classes
 - AnyRef
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        tan(columnName: String): Column
      
      
      
- columnName
 angle in radians
- returns
 tangent of the given value, as if computed by
java.lang.Math.tan
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        tan(e: Column): Column
      
      
      
- e
 angle in radians
- returns
 tangent of the given value, as if computed by
java.lang.Math.tan
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        tanh(columnName: String): Column
      
      
      
- columnName
 hyperbolic angle
- returns
 hyperbolic tangent of the given value, as if computed by
java.lang.Math.tanh
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        tanh(e: Column): Column
      
      
      
- e
 hyperbolic angle
- returns
 hyperbolic tangent of the given value, as if computed by
java.lang.Math.tanh
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        timestamp_micros(e: Column): Column
      
      
      
Creates timestamp from the number of microseconds since UTC epoch.
Creates timestamp from the number of microseconds since UTC epoch.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        timestamp_millis(e: Column): Column
      
      
      
Creates timestamp from the number of milliseconds since UTC epoch.
Creates timestamp from the number of milliseconds since UTC epoch.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        timestamp_seconds(e: Column): Column
      
      
      
Converts the number of seconds from the Unix epoch (1970-01-01T00:00:00Z) to a timestamp.
Converts the number of seconds from the Unix epoch (1970-01-01T00:00:00Z) to a timestamp.
- Since
 3.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        toString(): String
      
      
      
- Definition Classes
 - AnyRef → Any
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        to_binary(e: Column): Column
      
      
      
Converts the input
eto a binary value based on the default format "hex".Converts the input
eto a binary value based on the default format "hex". The function returns NULL if at least one of the input parameters is NULL.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        to_binary(e: Column, format: Column): Column
      
      
      
Converts the input
eto a binary value based on the suppliedformat.Converts the input
eto a binary value based on the suppliedformat. Theformatcan be a case-insensitive string literal of "hex", "utf-8", "utf8", or "base64". By default, the binary format for conversion is "hex" ifformatis omitted. The function returns NULL if at least one of the input parameters is NULL.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        to_char(e: Column, format: Column): Column
      
      
      
Convert
eto a string based on theformat.Convert
eto a string based on theformat. Throws an exception if the conversion fails. The format can consist of the following characters, case insensitive: '0' or '9': Specifies an expected digit between 0 and 9. A sequence of 0 or 9 in the format string matches a sequence of digits in the input value, generating a result string of the same length as the corresponding sequence in the format string. The result string is left-padded with zeros if the 0/9 sequence comprises more digits than the matching part of the decimal value, starts with 0, and is before the decimal point. Otherwise, it is padded with spaces. '.' or 'D': Specifies the position of the decimal point (optional, only allowed once). ',' or 'G': Specifies the position of the grouping (thousands) separator (,). There must be a 0 or 9 to the left and right of each grouping separator. '$': Specifies the location of the $ currency sign. This character may only be specified once. 'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, only allowed once at the beginning or end of the format string). Note that 'S' prints '+' for positive values but 'MI' prints a space. 'PR': Only allowed at the end of the format string; specifies that the result string will be wrapped by angle brackets if the input value is negative.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        to_csv(e: Column): Column
      
      
      
Converts a column containing a
StructTypeinto a CSV string with the specified schema.Converts a column containing a
StructTypeinto a CSV string with the specified schema. Throws an exception, in the case of an unsupported type.- e
 a column containing a struct.
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        to_csv(e: Column, options: Map[String, String]): Column
      
      
      
(Java-specific) Converts a column containing a
StructTypeinto a CSV string with the specified schema.(Java-specific) Converts a column containing a
StructTypeinto a CSV string with the specified schema. Throws an exception, in the case of an unsupported type.- e
 a column containing a struct.
- options
 options to control how the struct column is converted into a CSV string. It accepts the same options and the CSV data source. See Data Source Option in the version you use.
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        to_date(e: Column, fmt: String): Column
      
      
      
Converts the column into a
DateTypewith a specified formatConverts the column into a
DateTypewith a specified formatSee Datetime Patterns for valid date and time format patterns
- e
 A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as
yyyy-MM-ddoryyyy-MM-dd HH:mm:ss.SSSS- fmt
 A date time pattern detailing the format of
ewheneis a string- returns
 A date, or null if
ewas a string that could not be cast to a date orfmtwas an invalid format
- Since
 2.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        to_date(e: Column): Column
      
      
      
Converts the column into
DateTypeby casting rules toDateType.Converts the column into
DateTypeby casting rules toDateType.- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        to_json(e: Column): Column
      
      
      
Converts a column containing a
StructType,ArrayTypeor aMapTypeinto a JSON string with the specified schema.Converts a column containing a
StructType,ArrayTypeor aMapTypeinto a JSON string with the specified schema. Throws an exception, in the case of an unsupported type.- e
 a column containing a struct, an array or a map.
- Since
 2.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        to_json(e: Column, options: Map[String, String]): Column
      
      
      
(Java-specific) Converts a column containing a
StructType,ArrayTypeor aMapTypeinto a JSON string with the specified schema.(Java-specific) Converts a column containing a
StructType,ArrayTypeor aMapTypeinto a JSON string with the specified schema. Throws an exception, in the case of an unsupported type.- e
 a column containing a struct, an array or a map.
- options
 options to control how the struct column is converted into a json string. accepts the same options and the json data source. See Data Source Option in the version you use. Additionally the function supports the
prettyoption which enables pretty JSON generation.
- Since
 2.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        to_json(e: Column, options: Map[String, String]): Column
      
      
      
(Scala-specific) Converts a column containing a
StructType,ArrayTypeor aMapTypeinto a JSON string with the specified schema.(Scala-specific) Converts a column containing a
StructType,ArrayTypeor aMapTypeinto a JSON string with the specified schema. Throws an exception, in the case of an unsupported type.- e
 a column containing a struct, an array or a map.
- options
 options to control how the struct column is converted into a json string. accepts the same options and the json data source. See Data Source Option in the version you use. Additionally the function supports the
prettyoption which enables pretty JSON generation.
- Since
 2.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        to_number(e: Column, format: Column): Column
      
      
      
Convert string 'e' to a number based on the string format 'format'.
Convert string 'e' to a number based on the string format 'format'. Throws an exception if the conversion fails. The format can consist of the following characters, case insensitive: '0' or '9': Specifies an expected digit between 0 and 9. A sequence of 0 or 9 in the format string matches a sequence of digits in the input string. If the 0/9 sequence starts with 0 and is before the decimal point, it can only match a digit sequence of the same size. Otherwise, if the sequence starts with 9 or is after the decimal point, it can match a digit sequence that has the same or smaller size. '.' or 'D': Specifies the position of the decimal point (optional, only allowed once). ',' or 'G': Specifies the position of the grouping (thousands) separator (,). There must be a 0 or 9 to the left and right of each grouping separator. 'expr' must match the grouping separator relevant for the size of the number. '$': Specifies the location of the $ currency sign. This character may only be specified once. 'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, only allowed once at the beginning or end of the format string). Note that 'S' allows '-' but 'MI' does not. 'PR': Only allowed at the end of the format string; specifies that 'expr' indicates a negative number with wrapping angled brackets.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        to_timestamp(s: Column, fmt: String): Column
      
      
      
Converts time string with the given pattern to timestamp.
Converts time string with the given pattern to timestamp.
See Datetime Patterns for valid date and time format patterns
- s
 A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as
yyyy-MM-ddoryyyy-MM-dd HH:mm:ss.SSSS- fmt
 A date time pattern detailing the format of
swhensis a string- returns
 A timestamp, or null if
swas a string that could not be cast to a timestamp orfmtwas an invalid format
- Since
 2.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        to_timestamp(s: Column): Column
      
      
      
Converts to a timestamp by casting rules to
TimestampType.Converts to a timestamp by casting rules to
TimestampType.- s
 A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as
yyyy-MM-ddoryyyy-MM-dd HH:mm:ss.SSSS- returns
 A timestamp, or null if the input was a string that could not be cast to a timestamp
- Since
 2.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        to_timestamp_ltz(timestamp: Column): Column
      
      
      
Parses the
timestampexpression with the default format to a timestamp without time zone.Parses the
timestampexpression with the default format to a timestamp without time zone. The default format follows casting rules to a timestamp. Returns null with invalid input.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        to_timestamp_ltz(timestamp: Column, format: Column): Column
      
      
      
Parses the
timestampexpression with theformatexpression to a timestamp without time zone.Parses the
timestampexpression with theformatexpression to a timestamp without time zone. Returns null with invalid input.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        to_timestamp_ntz(timestamp: Column): Column
      
      
      
Parses the
timestampexpression with the default format to a timestamp without time zone.Parses the
timestampexpression with the default format to a timestamp without time zone. The default format follows casting rules to a timestamp. Returns null with invalid input.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        to_timestamp_ntz(timestamp: Column, format: Column): Column
      
      
      
Parses the
timestamp_strexpression with theformatexpression to a timestamp without time zone.Parses the
timestamp_strexpression with theformatexpression to a timestamp without time zone. Returns null with invalid input.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        to_unix_timestamp(e: Column): Column
      
      
      
Returns the UNIX timestamp of the given time.
Returns the UNIX timestamp of the given time.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        to_unix_timestamp(e: Column, format: Column): Column
      
      
      
Returns the UNIX timestamp of the given time.
Returns the UNIX timestamp of the given time.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        to_utc_timestamp(ts: Column, tz: Column): Column
      
      
      
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC.
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.
- Since
 2.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        to_utc_timestamp(ts: Column, tz: String): Column
      
      
      
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC.
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.
- ts
 A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as
yyyy-MM-ddoryyyy-MM-dd HH:mm:ss.SSSS- tz
 A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous.
- returns
 A timestamp, or null if
tswas a string that could not be cast to a timestamp ortzwas an invalid value
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        to_varchar(e: Column, format: Column): Column
      
      
      
Convert
eto a string based on theformat.Convert
eto a string based on theformat. Throws an exception if the conversion fails. The format can consist of the following characters, case insensitive: '0' or '9': Specifies an expected digit between 0 and 9. A sequence of 0 or 9 in the format string matches a sequence of digits in the input value, generating a result string of the same length as the corresponding sequence in the format string. The result string is left-padded with zeros if the 0/9 sequence comprises more digits than the matching part of the decimal value, starts with 0, and is before the decimal point. Otherwise, it is padded with spaces. '.' or 'D': Specifies the position of the decimal point (optional, only allowed once). ',' or 'G': Specifies the position of the grouping (thousands) separator (,). There must be a 0 or 9 to the left and right of each grouping separator. '$': Specifies the location of the $ currency sign. This character may only be specified once. 'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, only allowed once at the beginning or end of the format string). Note that 'S' prints '+' for positive values but 'MI' prints a space. 'PR': Only allowed at the end of the format string; specifies that the result string will be wrapped by angle brackets if the input value is negative.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        transform(column: Column, f: (Column, Column) ⇒ Column): Column
      
      
      
Returns an array of elements after applying a transformation to each element in the input array.
Returns an array of elements after applying a transformation to each element in the input array.
df.select(transform(col("i"), (x, i) => x + i))
- column
 the input array column
- f
 (col, index) => transformed_col, the lambda function to filter the input column given the index. Indices start at 0.
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        transform(column: Column, f: (Column) ⇒ Column): Column
      
      
      
Returns an array of elements after applying a transformation to each element in the input array.
Returns an array of elements after applying a transformation to each element in the input array.
df.select(transform(col("i"), x => x + 1))
- column
 the input array column
- f
 col => transformed_col, the lambda function to transform the input column
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        transform_keys(expr: Column, f: (Column, Column) ⇒ Column): Column
      
      
      
Applies a function to every key-value pair in a map and returns a map with the results of those applications as the new keys for the pairs.
Applies a function to every key-value pair in a map and returns a map with the results of those applications as the new keys for the pairs.
df.select(transform_keys(col("i"), (k, v) => k + v))
- expr
 the input map column
- f
 (key, value) => new_key, the lambda function to transform the key of input map column
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        transform_values(expr: Column, f: (Column, Column) ⇒ Column): Column
      
      
      
Applies a function to every key-value pair in a map and returns a map with the results of those applications as the new values for the pairs.
Applies a function to every key-value pair in a map and returns a map with the results of those applications as the new values for the pairs.
df.select(transform_values(col("i"), (k, v) => k + v))
- expr
 the input map column
- f
 (key, value) => new_value, the lambda function to transform the value of input map column
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        translate(src: Column, matchingString: String, replaceString: String): Column
      
      
      
Translate any character in the src by a character in replaceString.
Translate any character in the src by a character in replaceString. The characters in replaceString correspond to the characters in matchingString. The translate will happen when any character in the string matches the character in the
matchingString.- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        trim(e: Column, trimString: String): Column
      
      
      
Trim the specified character from both ends for the specified string column.
Trim the specified character from both ends for the specified string column.
- Since
 2.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        trim(e: Column): Column
      
      
      
Trim the spaces from both ends for the specified string column.
Trim the spaces from both ends for the specified string column.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        trunc(date: Column, format: String): Column
      
      
      
Returns date truncated to the unit specified by the format.
Returns date truncated to the unit specified by the format.
For example,
trunc("2018-11-19 12:01:19", "year")returns 2018-01-01- date
 A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as
yyyy-MM-ddoryyyy-MM-dd HH:mm:ss.SSSS- returns
 A date, or null if
datewas a string that could not be cast to a date orformatwas an invalid value
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        try_add(left: Column, right: Column): Column
      
      
      
Returns the sum of
leftandrightand the result is null on overflow.Returns the sum of
leftandrightand the result is null on overflow. The acceptable input types are the same with the+operator.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        try_aes_decrypt(input: Column, key: Column): Column
      
      
      
Returns a decrypted value of
input.Returns a decrypted value of
input.- Since
 3.5.0
- See also
 org.apache.spark.sql.functions.try_aes_decrypt(Column, Column, Column, Column, Column)
 - 
      
      
      
        
      
    
      
        
        def
      
      
        try_aes_decrypt(input: Column, key: Column, mode: Column): Column
      
      
      
Returns a decrypted value of
input.Returns a decrypted value of
input.- Since
 3.5.0
- See also
 org.apache.spark.sql.functions.try_aes_decrypt(Column, Column, Column, Column, Column)
 - 
      
      
      
        
      
    
      
        
        def
      
      
        try_aes_decrypt(input: Column, key: Column, mode: Column, padding: Column): Column
      
      
      
Returns a decrypted value of
input.Returns a decrypted value of
input.- Since
 3.5.0
- See also
 org.apache.spark.sql.functions.try_aes_decrypt(Column, Column, Column, Column, Column)
 - 
      
      
      
        
      
    
      
        
        def
      
      
        try_aes_decrypt(input: Column, key: Column, mode: Column, padding: Column, aad: Column): Column
      
      
      
This is a special version of
aes_decryptthat performs the same operation, but returns a NULL value instead of raising an error if the decryption cannot be performed.This is a special version of
aes_decryptthat performs the same operation, but returns a NULL value instead of raising an error if the decryption cannot be performed.- input
 The binary value to decrypt.
- key
 The passphrase to use to decrypt the data.
- mode
 Specifies which block cipher mode should be used to decrypt messages. Valid modes: ECB, GCM, CBC.
- padding
 Specifies how to pad messages whose length is not a multiple of the block size. Valid values: PKCS, NONE, DEFAULT. The DEFAULT padding means PKCS for ECB, NONE for GCM and PKCS for CBC.
- aad
 Optional additional authenticated data. Only supported for GCM mode. This can be any free-form input and must be provided for both encryption and decryption.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        try_avg(e: Column): Column
      
      
      
Returns the mean calculated from values of a group and the result is null on overflow.
Returns the mean calculated from values of a group and the result is null on overflow.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        try_divide(dividend: Column, divisor: Column): Column
      
      
      
Returns
dividend/divisor.Returns
dividend/divisor. It always performs floating point division. Its result is always null ifdivisoris 0.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        try_element_at(column: Column, value: Column): Column
      
      
      
(array, index) - Returns element of array at given (1-based) index.
(array, index) - Returns element of array at given (1-based) index. If Index is 0, Spark will throw an error. If index < 0, accesses elements from the last to the first. The function always returns NULL if the index exceeds the length of the array.
(map, key) - Returns value for given key. The function always returns NULL if the key is not contained in the map.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        try_multiply(left: Column, right: Column): Column
      
      
      
Returns
left*rightand the result is null on overflow.Returns
left*rightand the result is null on overflow. The acceptable input types are the same with the*operator.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        try_subtract(left: Column, right: Column): Column
      
      
      
Returns
left-rightand the result is null on overflow.Returns
left-rightand the result is null on overflow. The acceptable input types are the same with the-operator.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        try_sum(e: Column): Column
      
      
      
Returns the sum calculated from values of a group and the result is null on overflow.
Returns the sum calculated from values of a group and the result is null on overflow.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        try_to_binary(e: Column): Column
      
      
      
This is a special version of
to_binarythat performs the same operation, but returns a NULL value instead of raising an error if the conversion cannot be performed.This is a special version of
to_binarythat performs the same operation, but returns a NULL value instead of raising an error if the conversion cannot be performed.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        try_to_binary(e: Column, format: Column): Column
      
      
      
This is a special version of
to_binarythat performs the same operation, but returns a NULL value instead of raising an error if the conversion cannot be performed.This is a special version of
to_binarythat performs the same operation, but returns a NULL value instead of raising an error if the conversion cannot be performed.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        try_to_number(e: Column, format: Column): Column
      
      
      
Convert string
eto a number based on the string formatformat.Convert string
eto a number based on the string formatformat. Returns NULL if the stringedoes not match the expected format. The format follows the same semantics as the to_number function.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        try_to_timestamp(s: Column): Column
      
      
      
Parses the
sto a timestamp.Parses the
sto a timestamp. The function always returns null on an invalid input with/without ANSI SQL mode enabled. It follows casting rules to a timestamp. The result data type is consistent with the value of configurationspark.sql.timestampType.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        try_to_timestamp(s: Column, format: Column): Column
      
      
      
Parses the
swith theformatto a timestamp.Parses the
swith theformatto a timestamp. The function always returns null on an invalid input with/without ANSI SQL mode enabled. The result data type is consistent with the value of configurationspark.sql.timestampType.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        typedLit[T](literal: T)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[T]): Column
      
      
      
Creates a Column of literal value.
Creates a Column of literal value.
An alias of
typedlit, and it is encouraged to usetypedlitdirectly.- Since
 2.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        typedlit[T](literal: T)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[T]): Column
      
      
      
Creates a Column of literal value.
Creates a Column of literal value.
The passed in object is returned directly if it is already a Column. If the object is a Scala Symbol, it is converted into a Column also. Otherwise, a new Column is created to represent the literal value. The difference between this function and lit is that this function can handle parameterized scala types e.g.: List, Seq and Map.
- Since
 3.2.0
- Note
 typedlitwill call expensive Scala reflection APIs.litis preferred if parameterized Scala types are not used.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        typeof(col: Column): Column
      
      
      
Return DDL-formatted type string for the data type of the input.
Return DDL-formatted type string for the data type of the input.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        ucase(str: Column): Column
      
      
      
Returns
strwith all characters changed to uppercase.Returns
strwith all characters changed to uppercase.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        udaf[IN, BUF, OUT](agg: expressions.Aggregator[IN, BUF, OUT], inputEncoder: Encoder[IN]): UserDefinedFunction
      
      
      
Obtains a
UserDefinedFunctionthat wraps the givenAggregatorso that it may be used with untyped Data Frames.Obtains a
UserDefinedFunctionthat wraps the givenAggregatorso that it may be used with untyped Data Frames.Aggregator<IN, BUF, OUT> agg = // custom Aggregator Encoder<IN> enc = // input encoder // declare a UDF based on agg UserDefinedFunction aggUDF = udaf(agg, enc) DataFrame aggData = df.agg(aggUDF($"colname")) // register agg as a named function spark.udf.register("myAggName", udaf(agg, enc))
- IN
 the aggregator input type
- BUF
 the aggregating buffer type
- OUT
 the finalized output type
- agg
 the typed Aggregator
- inputEncoder
 a specific input encoder to use
- returns
 a UserDefinedFunction that can be used as an aggregating expression
- Note
 This overloading takes an explicit input encoder, to support UDAF declarations in Java.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        udaf[IN, BUF, OUT](agg: expressions.Aggregator[IN, BUF, OUT])(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[IN]): UserDefinedFunction
      
      
      
Obtains a
UserDefinedFunctionthat wraps the givenAggregatorso that it may be used with untyped Data Frames.Obtains a
UserDefinedFunctionthat wraps the givenAggregatorso that it may be used with untyped Data Frames.val agg = // Aggregator[IN, BUF, OUT] // declare a UDF based on agg val aggUDF = udaf(agg) val aggData = df.agg(aggUDF($"colname")) // register agg as a named function spark.udf.register("myAggName", udaf(agg))
- IN
 the aggregator input type
- BUF
 the aggregating buffer type
- OUT
 the finalized output type
- agg
 the typed Aggregator
- returns
 a UserDefinedFunction that can be used as an aggregating expression.
- Note
 The input encoder is inferred from the input type IN.
 - 
      
      
      
        
      
    
      
        
        def
      
      
        udf(f: UDF10[_, _, _, _, _, _, _, _, _, _, _], returnType: DataType): UserDefinedFunction
      
      
      
Defines a Java UDF10 instance as user-defined function (UDF).
Defines a Java UDF10 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the API
UserDefinedFunction.asNondeterministic().- Since
 2.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        udf(f: UDF9[_, _, _, _, _, _, _, _, _, _], returnType: DataType): UserDefinedFunction
      
      
      
Defines a Java UDF9 instance as user-defined function (UDF).
Defines a Java UDF9 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the API
UserDefinedFunction.asNondeterministic().- Since
 2.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        udf(f: UDF8[_, _, _, _, _, _, _, _, _], returnType: DataType): UserDefinedFunction
      
      
      
Defines a Java UDF8 instance as user-defined function (UDF).
Defines a Java UDF8 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the API
UserDefinedFunction.asNondeterministic().- Since
 2.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        udf(f: UDF7[_, _, _, _, _, _, _, _], returnType: DataType): UserDefinedFunction
      
      
      
Defines a Java UDF7 instance as user-defined function (UDF).
Defines a Java UDF7 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the API
UserDefinedFunction.asNondeterministic().- Since
 2.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        udf(f: UDF6[_, _, _, _, _, _, _], returnType: DataType): UserDefinedFunction
      
      
      
Defines a Java UDF6 instance as user-defined function (UDF).
Defines a Java UDF6 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the API
UserDefinedFunction.asNondeterministic().- Since
 2.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        udf(f: UDF5[_, _, _, _, _, _], returnType: DataType): UserDefinedFunction
      
      
      
Defines a Java UDF5 instance as user-defined function (UDF).
Defines a Java UDF5 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the API
UserDefinedFunction.asNondeterministic().- Since
 2.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        udf(f: UDF4[_, _, _, _, _], returnType: DataType): UserDefinedFunction
      
      
      
Defines a Java UDF4 instance as user-defined function (UDF).
Defines a Java UDF4 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the API
UserDefinedFunction.asNondeterministic().- Since
 2.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        udf(f: UDF3[_, _, _, _], returnType: DataType): UserDefinedFunction
      
      
      
Defines a Java UDF3 instance as user-defined function (UDF).
Defines a Java UDF3 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the API
UserDefinedFunction.asNondeterministic().- Since
 2.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        udf(f: UDF2[_, _, _], returnType: DataType): UserDefinedFunction
      
      
      
Defines a Java UDF2 instance as user-defined function (UDF).
Defines a Java UDF2 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the API
UserDefinedFunction.asNondeterministic().- Since
 2.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        udf(f: UDF1[_, _], returnType: DataType): UserDefinedFunction
      
      
      
Defines a Java UDF1 instance as user-defined function (UDF).
Defines a Java UDF1 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the API
UserDefinedFunction.asNondeterministic().- Since
 2.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        udf(f: UDF0[_], returnType: DataType): UserDefinedFunction
      
      
      
Defines a Java UDF0 instance as user-defined function (UDF).
Defines a Java UDF0 instance as user-defined function (UDF). The caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the API
UserDefinedFunction.asNondeterministic().- Since
 2.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        udf[RT, A1, A2, A3, A4, A5, A6, A7, A8, A9, A10](f: (A1, A2, A3, A4, A5, A6, A7, A8, A9, A10) ⇒ RT)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[RT], arg1: scala.reflect.api.JavaUniverse.TypeTag[A1], arg2: scala.reflect.api.JavaUniverse.TypeTag[A2], arg3: scala.reflect.api.JavaUniverse.TypeTag[A3], arg4: scala.reflect.api.JavaUniverse.TypeTag[A4], arg5: scala.reflect.api.JavaUniverse.TypeTag[A5], arg6: scala.reflect.api.JavaUniverse.TypeTag[A6], arg7: scala.reflect.api.JavaUniverse.TypeTag[A7], arg8: scala.reflect.api.JavaUniverse.TypeTag[A8], arg9: scala.reflect.api.JavaUniverse.TypeTag[A9], arg10: scala.reflect.api.JavaUniverse.TypeTag[A10]): UserDefinedFunction
      
      
      
Defines a Scala closure of 10 arguments as user-defined function (UDF).
Defines a Scala closure of 10 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the API
UserDefinedFunction.asNondeterministic().- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        udf[RT, A1, A2, A3, A4, A5, A6, A7, A8, A9](f: (A1, A2, A3, A4, A5, A6, A7, A8, A9) ⇒ RT)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[RT], arg1: scala.reflect.api.JavaUniverse.TypeTag[A1], arg2: scala.reflect.api.JavaUniverse.TypeTag[A2], arg3: scala.reflect.api.JavaUniverse.TypeTag[A3], arg4: scala.reflect.api.JavaUniverse.TypeTag[A4], arg5: scala.reflect.api.JavaUniverse.TypeTag[A5], arg6: scala.reflect.api.JavaUniverse.TypeTag[A6], arg7: scala.reflect.api.JavaUniverse.TypeTag[A7], arg8: scala.reflect.api.JavaUniverse.TypeTag[A8], arg9: scala.reflect.api.JavaUniverse.TypeTag[A9]): UserDefinedFunction
      
      
      
Defines a Scala closure of 9 arguments as user-defined function (UDF).
Defines a Scala closure of 9 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the API
UserDefinedFunction.asNondeterministic().- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        udf[RT, A1, A2, A3, A4, A5, A6, A7, A8](f: (A1, A2, A3, A4, A5, A6, A7, A8) ⇒ RT)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[RT], arg1: scala.reflect.api.JavaUniverse.TypeTag[A1], arg2: scala.reflect.api.JavaUniverse.TypeTag[A2], arg3: scala.reflect.api.JavaUniverse.TypeTag[A3], arg4: scala.reflect.api.JavaUniverse.TypeTag[A4], arg5: scala.reflect.api.JavaUniverse.TypeTag[A5], arg6: scala.reflect.api.JavaUniverse.TypeTag[A6], arg7: scala.reflect.api.JavaUniverse.TypeTag[A7], arg8: scala.reflect.api.JavaUniverse.TypeTag[A8]): UserDefinedFunction
      
      
      
Defines a Scala closure of 8 arguments as user-defined function (UDF).
Defines a Scala closure of 8 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the API
UserDefinedFunction.asNondeterministic().- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        udf[RT, A1, A2, A3, A4, A5, A6, A7](f: (A1, A2, A3, A4, A5, A6, A7) ⇒ RT)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[RT], arg1: scala.reflect.api.JavaUniverse.TypeTag[A1], arg2: scala.reflect.api.JavaUniverse.TypeTag[A2], arg3: scala.reflect.api.JavaUniverse.TypeTag[A3], arg4: scala.reflect.api.JavaUniverse.TypeTag[A4], arg5: scala.reflect.api.JavaUniverse.TypeTag[A5], arg6: scala.reflect.api.JavaUniverse.TypeTag[A6], arg7: scala.reflect.api.JavaUniverse.TypeTag[A7]): UserDefinedFunction
      
      
      
Defines a Scala closure of 7 arguments as user-defined function (UDF).
Defines a Scala closure of 7 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the API
UserDefinedFunction.asNondeterministic().- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        udf[RT, A1, A2, A3, A4, A5, A6](f: (A1, A2, A3, A4, A5, A6) ⇒ RT)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[RT], arg1: scala.reflect.api.JavaUniverse.TypeTag[A1], arg2: scala.reflect.api.JavaUniverse.TypeTag[A2], arg3: scala.reflect.api.JavaUniverse.TypeTag[A3], arg4: scala.reflect.api.JavaUniverse.TypeTag[A4], arg5: scala.reflect.api.JavaUniverse.TypeTag[A5], arg6: scala.reflect.api.JavaUniverse.TypeTag[A6]): UserDefinedFunction
      
      
      
Defines a Scala closure of 6 arguments as user-defined function (UDF).
Defines a Scala closure of 6 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the API
UserDefinedFunction.asNondeterministic().- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        udf[RT, A1, A2, A3, A4, A5](f: (A1, A2, A3, A4, A5) ⇒ RT)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[RT], arg1: scala.reflect.api.JavaUniverse.TypeTag[A1], arg2: scala.reflect.api.JavaUniverse.TypeTag[A2], arg3: scala.reflect.api.JavaUniverse.TypeTag[A3], arg4: scala.reflect.api.JavaUniverse.TypeTag[A4], arg5: scala.reflect.api.JavaUniverse.TypeTag[A5]): UserDefinedFunction
      
      
      
Defines a Scala closure of 5 arguments as user-defined function (UDF).
Defines a Scala closure of 5 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the API
UserDefinedFunction.asNondeterministic().- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        udf[RT, A1, A2, A3, A4](f: (A1, A2, A3, A4) ⇒ RT)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[RT], arg1: scala.reflect.api.JavaUniverse.TypeTag[A1], arg2: scala.reflect.api.JavaUniverse.TypeTag[A2], arg3: scala.reflect.api.JavaUniverse.TypeTag[A3], arg4: scala.reflect.api.JavaUniverse.TypeTag[A4]): UserDefinedFunction
      
      
      
Defines a Scala closure of 4 arguments as user-defined function (UDF).
Defines a Scala closure of 4 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the API
UserDefinedFunction.asNondeterministic().- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        udf[RT, A1, A2, A3](f: (A1, A2, A3) ⇒ RT)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[RT], arg1: scala.reflect.api.JavaUniverse.TypeTag[A1], arg2: scala.reflect.api.JavaUniverse.TypeTag[A2], arg3: scala.reflect.api.JavaUniverse.TypeTag[A3]): UserDefinedFunction
      
      
      
Defines a Scala closure of 3 arguments as user-defined function (UDF).
Defines a Scala closure of 3 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the API
UserDefinedFunction.asNondeterministic().- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        udf[RT, A1, A2](f: (A1, A2) ⇒ RT)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[RT], arg1: scala.reflect.api.JavaUniverse.TypeTag[A1], arg2: scala.reflect.api.JavaUniverse.TypeTag[A2]): UserDefinedFunction
      
      
      
Defines a Scala closure of 2 arguments as user-defined function (UDF).
Defines a Scala closure of 2 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the API
UserDefinedFunction.asNondeterministic().- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        udf[RT, A1](f: (A1) ⇒ RT)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[RT], arg1: scala.reflect.api.JavaUniverse.TypeTag[A1]): UserDefinedFunction
      
      
      
Defines a Scala closure of 1 arguments as user-defined function (UDF).
Defines a Scala closure of 1 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the API
UserDefinedFunction.asNondeterministic().- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        udf[RT](f: () ⇒ RT)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[RT]): UserDefinedFunction
      
      
      
Defines a Scala closure of 0 arguments as user-defined function (UDF).
Defines a Scala closure of 0 arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature. By default the returned UDF is deterministic. To change it to nondeterministic, call the API
UserDefinedFunction.asNondeterministic().- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        unbase64(e: Column): Column
      
      
      
Decodes a BASE64 encoded string column and returns it as a binary column.
Decodes a BASE64 encoded string column and returns it as a binary column. This is the reverse of base64.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        unhex(column: Column): Column
      
      
      
Inverse of hex.
Inverse of hex. Interprets each pair of characters as a hexadecimal number and converts to the byte representation of number.
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        unix_date(e: Column): Column
      
      
      
Returns the number of days since 1970-01-01.
Returns the number of days since 1970-01-01.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        unix_micros(e: Column): Column
      
      
      
Returns the number of microseconds since 1970-01-01 00:00:00 UTC.
Returns the number of microseconds since 1970-01-01 00:00:00 UTC.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        unix_millis(e: Column): Column
      
      
      
Returns the number of milliseconds since 1970-01-01 00:00:00 UTC.
Returns the number of milliseconds since 1970-01-01 00:00:00 UTC. Truncates higher levels of precision.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        unix_seconds(e: Column): Column
      
      
      
Returns the number of seconds since 1970-01-01 00:00:00 UTC.
Returns the number of seconds since 1970-01-01 00:00:00 UTC. Truncates higher levels of precision.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        unix_timestamp(s: Column, p: String): Column
      
      
      
Converts time string with given pattern to Unix timestamp (in seconds).
Converts time string with given pattern to Unix timestamp (in seconds).
See Datetime Patterns for valid date and time format patterns
- s
 A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as
yyyy-MM-ddoryyyy-MM-dd HH:mm:ss.SSSS- p
 A date time pattern detailing the format of
swhensis a string- returns
 A long, or null if
swas a string that could not be cast to a date orpwas an invalid format
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        unix_timestamp(s: Column): Column
      
      
      
Converts time string in format yyyy-MM-dd HH:mm:ss to Unix timestamp (in seconds), using the default timezone and the default locale.
Converts time string in format yyyy-MM-dd HH:mm:ss to Unix timestamp (in seconds), using the default timezone and the default locale.
- s
 A date, timestamp or string. If a string, the data must be in the
yyyy-MM-dd HH:mm:ssformat- returns
 A long, or null if the input was a string not of the correct format
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        unix_timestamp(): Column
      
      
      
Returns the current Unix timestamp (in seconds) as a long.
Returns the current Unix timestamp (in seconds) as a long.
- Since
 1.5.0
- Note
 All calls of
unix_timestampwithin the same query return the same value (i.e. the current timestamp is calculated at the start of query evaluation).
 - 
      
      
      
        
      
    
      
        
        def
      
      
        unwrap_udt(column: Column): Column
      
      
      
Unwrap UDT data type column into its underlying type.
Unwrap UDT data type column into its underlying type.
- Since
 3.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        upper(e: Column): Column
      
      
      
Converts a string column to upper case.
Converts a string column to upper case.
- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        url_decode(str: Column): Column
      
      
      
Decodes a
strin 'application/x-www-form-urlencoded' format using a specific encoding scheme.Decodes a
strin 'application/x-www-form-urlencoded' format using a specific encoding scheme.- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        url_encode(str: Column): Column
      
      
      
Translates a string into 'application/x-www-form-urlencoded' format using a specific encoding scheme.
Translates a string into 'application/x-www-form-urlencoded' format using a specific encoding scheme.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        user(): Column
      
      
      
Returns the user name of current execution context.
Returns the user name of current execution context.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        uuid(): Column
      
      
      
Returns an universally unique identifier (UUID) string.
Returns an universally unique identifier (UUID) string. The value is returned as a canonical UUID 36-character string.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        var_pop(columnName: String): Column
      
      
      
Aggregate function: returns the population variance of the values in a group.
Aggregate function: returns the population variance of the values in a group.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        var_pop(e: Column): Column
      
      
      
Aggregate function: returns the population variance of the values in a group.
Aggregate function: returns the population variance of the values in a group.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        var_samp(columnName: String): Column
      
      
      
Aggregate function: returns the unbiased variance of the values in a group.
Aggregate function: returns the unbiased variance of the values in a group.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        var_samp(e: Column): Column
      
      
      
Aggregate function: returns the unbiased variance of the values in a group.
Aggregate function: returns the unbiased variance of the values in a group.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        variance(columnName: String): Column
      
      
      
Aggregate function: alias for
var_samp.Aggregate function: alias for
var_samp.- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        variance(e: Column): Column
      
      
      
Aggregate function: alias for
var_samp.Aggregate function: alias for
var_samp.- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        version(): Column
      
      
      
Returns the Spark version.
Returns the Spark version. The string contains 2 fields, the first being a release version and the second being a git revision.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(): Unit
      
      
      
- Definition Classes
 - AnyRef
 - Annotations
 - @throws( ... )
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(arg0: Long, arg1: Int): Unit
      
      
      
- Definition Classes
 - AnyRef
 - Annotations
 - @throws( ... )
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(arg0: Long): Unit
      
      
      
- Definition Classes
 - AnyRef
 - Annotations
 - @throws( ... ) @native()
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        weekday(e: Column): Column
      
      
      
Returns the day of the week for date/timestamp (0 = Monday, 1 = Tuesday, ..., 6 = Sunday).
Returns the day of the week for date/timestamp (0 = Monday, 1 = Tuesday, ..., 6 = Sunday).
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        weekofyear(e: Column): Column
      
      
      
Extracts the week number as an integer from a given date/timestamp/string.
Extracts the week number as an integer from a given date/timestamp/string.
A week is considered to start on a Monday and week 1 is the first week with more than 3 days, as defined by ISO 8601
- returns
 An integer, or null if the input was a string that could not be cast to a date
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        when(condition: Column, value: Any): Column
      
      
      
Evaluates a list of conditions and returns one of multiple possible result expressions.
Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.
// Example: encoding gender string column into integer. // Scala: people.select(when(people("gender") === "male", 0) .when(people("gender") === "female", 1) .otherwise(2)) // Java: people.select(when(col("gender").equalTo("male"), 0) .when(col("gender").equalTo("female"), 1) .otherwise(2))
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        width_bucket(v: Column, min: Column, max: Column, numBucket: Column): Column
      
      
      
Returns the bucket number into which the value of this expression would fall after being evaluated.
Returns the bucket number into which the value of this expression would fall after being evaluated. Note that input arguments must follow conditions listed below; otherwise, the method will return null.
- v
 value to compute a bucket number in the histogram
- min
 minimum value of the histogram
- max
 maximum value of the histogram
- numBucket
 the number of buckets
- returns
 the bucket number into which the value would fall after being evaluated
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        window(timeColumn: Column, windowDuration: String): Column
      
      
      
Generates tumbling time windows given a timestamp specifying column.
Generates tumbling time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. The windows start beginning at 1970-01-01 00:00:00 UTC. The following example takes the average stock price for a one minute tumbling window:
val df = ... // schema => timestamp: TimestampType, stockId: StringType, price: DoubleType df.groupBy(window($"timestamp", "1 minute"), $"stockId") .agg(mean("price"))
The windows will look like:
09:00:00-09:01:00 09:01:00-09:02:00 09:02:00-09:03:00 ...
For a streaming query, you may use the function
current_timestampto generate windows on processing time.- timeColumn
 The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType or TimestampNTZType.
- windowDuration
 A string specifying the width of the window, e.g.
10 minutes,1 second. Checkorg.apache.spark.unsafe.types.CalendarIntervalfor valid duration identifiers.
- Since
 2.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        window(timeColumn: Column, windowDuration: String, slideDuration: String): Column
      
      
      
Bucketize rows into one or more time windows given a timestamp specifying column.
Bucketize rows into one or more time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. The windows start beginning at 1970-01-01 00:00:00 UTC. The following example takes the average stock price for a one minute window every 10 seconds:
val df = ... // schema => timestamp: TimestampType, stockId: StringType, price: DoubleType df.groupBy(window($"timestamp", "1 minute", "10 seconds"), $"stockId") .agg(mean("price"))
The windows will look like:
09:00:00-09:01:00 09:00:10-09:01:10 09:00:20-09:01:20 ...
For a streaming query, you may use the function
current_timestampto generate windows on processing time.- timeColumn
 The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType or TimestampNTZType.
- windowDuration
 A string specifying the width of the window, e.g.
10 minutes,1 second. Checkorg.apache.spark.unsafe.types.CalendarIntervalfor valid duration identifiers. Note that the duration is a fixed length of time, and does not vary over time according to a calendar. For example,1 dayalways means 86,400,000 milliseconds, not a calendar day.- slideDuration
 A string specifying the sliding interval of the window, e.g.
1 minute. A new window will be generated everyslideDuration. Must be less than or equal to thewindowDuration. Checkorg.apache.spark.unsafe.types.CalendarIntervalfor valid duration identifiers. This duration is likewise absolute, and does not vary according to a calendar.
- Since
 2.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        window(timeColumn: Column, windowDuration: String, slideDuration: String, startTime: String): Column
      
      
      
Bucketize rows into one or more time windows given a timestamp specifying column.
Bucketize rows into one or more time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. The following example takes the average stock price for a one minute window every 10 seconds starting 5 seconds after the hour:
val df = ... // schema => timestamp: TimestampType, stockId: StringType, price: DoubleType df.groupBy(window($"timestamp", "1 minute", "10 seconds", "5 seconds"), $"stockId") .agg(mean("price"))
The windows will look like:
09:00:05-09:01:05 09:00:15-09:01:15 09:00:25-09:01:25 ...
For a streaming query, you may use the function
current_timestampto generate windows on processing time.- timeColumn
 The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType or TimestampNTZType.
- windowDuration
 A string specifying the width of the window, e.g.
10 minutes,1 second. Checkorg.apache.spark.unsafe.types.CalendarIntervalfor valid duration identifiers. Note that the duration is a fixed length of time, and does not vary over time according to a calendar. For example,1 dayalways means 86,400,000 milliseconds, not a calendar day.- slideDuration
 A string specifying the sliding interval of the window, e.g.
1 minute. A new window will be generated everyslideDuration. Must be less than or equal to thewindowDuration. Checkorg.apache.spark.unsafe.types.CalendarIntervalfor valid duration identifiers. This duration is likewise absolute, and does not vary according to a calendar.- startTime
 The offset with respect to 1970-01-01 00:00:00 UTC with which to start window intervals. For example, in order to have hourly tumbling windows that start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide
startTimeas15 minutes.
- Since
 2.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        window_time(windowColumn: Column): Column
      
      
      
Extracts the event time from the window column.
Extracts the event time from the window column.
The window column is of StructType { start: Timestamp, end: Timestamp } where start is inclusive and end is exclusive. Since event time can support microsecond precision, window_time(window) = window.end - 1 microsecond.
- windowColumn
 The window column (typically produced by window aggregation) of type StructType { start: Timestamp, end: Timestamp }
- Since
 3.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        xpath(x: Column, p: Column): Column
      
      
      
Returns a string array of values within the nodes of xml that match the XPath expression.
Returns a string array of values within the nodes of xml that match the XPath expression.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        xpath_boolean(x: Column, p: Column): Column
      
      
      
Returns true if the XPath expression evaluates to true, or if a matching node is found.
Returns true if the XPath expression evaluates to true, or if a matching node is found.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        xpath_double(x: Column, p: Column): Column
      
      
      
Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        xpath_float(x: Column, p: Column): Column
      
      
      
Returns a float value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
Returns a float value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        xpath_int(x: Column, p: Column): Column
      
      
      
Returns an integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.
Returns an integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        xpath_long(x: Column, p: Column): Column
      
      
      
Returns a long integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.
Returns a long integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        xpath_number(x: Column, p: Column): Column
      
      
      
Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        xpath_short(x: Column, p: Column): Column
      
      
      
Returns a short integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.
Returns a short integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        xpath_string(x: Column, p: Column): Column
      
      
      
Returns the text contents of the first xml node that matches the XPath expression.
Returns the text contents of the first xml node that matches the XPath expression.
- Since
 3.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        xxhash64(cols: Column*): Column
      
      
      
Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column.
Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column. The hash computation uses an initial seed of 42.
- Annotations
 - @varargs()
 - Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        year(e: Column): Column
      
      
      
Extracts the year as an integer from a given date/timestamp/string.
Extracts the year as an integer from a given date/timestamp/string.
- returns
 An integer, or null if the input was a string that could not be cast to a date
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        years(e: Column): Column
      
      
      
A transform for timestamps and dates to partition data into years.
A transform for timestamps and dates to partition data into years.
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        zip_with(left: Column, right: Column, f: (Column, Column) ⇒ Column): Column
      
      
      
Merge two given arrays, element-wise, into a single array using a function.
Merge two given arrays, element-wise, into a single array using a function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying the function.
df.select(zip_with(df1("val1"), df1("val2"), (x, y) => x + y))
- left
 the left input array column
- right
 the right input array column
- f
 (lCol, rCol) => col, the lambda function to merge two input columns into one column
- Since
 3.0.0
 
Deprecated Value Members
- 
      
      
      
        
      
    
      
        
        def
      
      
        approxCountDistinct(columnName: String, rsd: Double): Column
      
      
      
- Annotations
 - @deprecated
 - Deprecated
 (Since version 2.1.0) Use approx_count_distinct
- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        approxCountDistinct(e: Column, rsd: Double): Column
      
      
      
- Annotations
 - @deprecated
 - Deprecated
 (Since version 2.1.0) Use approx_count_distinct
- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        approxCountDistinct(columnName: String): Column
      
      
      
- Annotations
 - @deprecated
 - Deprecated
 (Since version 2.1.0) Use approx_count_distinct
- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        approxCountDistinct(e: Column): Column
      
      
      
- Annotations
 - @deprecated
 - Deprecated
 (Since version 2.1.0) Use approx_count_distinct
- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        bitwiseNOT(e: Column): Column
      
      
      
Computes bitwise NOT (~) of a number.
Computes bitwise NOT (~) of a number.
- Annotations
 - @deprecated
 - Deprecated
 (Since version 3.2.0) Use bitwise_not
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        callUDF(udfName: String, cols: Column*): Column
      
      
      
Call an user-defined function.
Call an user-defined function.
- Annotations
 - @varargs() @deprecated
 - Deprecated
 Use call_udf
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        monotonicallyIncreasingId(): Column
      
      
      
A column expression that generates monotonically increasing 64-bit integers.
A column expression that generates monotonically increasing 64-bit integers.
The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the record number within each partition in the lower 33 bits. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records.
As an example, consider a
DataFramewith two partitions, each with 3 records. This expression would return the following IDs:0, 1, 2, 8589934592 (1L << 33), 8589934593, 8589934594.
- Annotations
 - @deprecated
 - Deprecated
 (Since version 2.0.0) Use monotonically_increasing_id()
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        shiftLeft(e: Column, numBits: Int): Column
      
      
      
Shift the given value numBits left.
Shift the given value numBits left. If the given value is a long value, this function will return a long value else it will return an integer value.
- Annotations
 - @deprecated
 - Deprecated
 (Since version 3.2.0) Use shiftleft
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        shiftRight(e: Column, numBits: Int): Column
      
      
      
(Signed) shift the given value numBits right.
(Signed) shift the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value.
- Annotations
 - @deprecated
 - Deprecated
 (Since version 3.2.0) Use shiftright
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        shiftRightUnsigned(e: Column, numBits: Int): Column
      
      
      
Unsigned shift the given value numBits right.
Unsigned shift the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value.
- Annotations
 - @deprecated
 - Deprecated
 (Since version 3.2.0) Use shiftrightunsigned
- Since
 1.5.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        sumDistinct(columnName: String): Column
      
      
      
Aggregate function: returns the sum of distinct values in the expression.
Aggregate function: returns the sum of distinct values in the expression.
- Annotations
 - @deprecated
 - Deprecated
 (Since version 3.2.0) Use sum_distinct
- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        sumDistinct(e: Column): Column
      
      
      
Aggregate function: returns the sum of distinct values in the expression.
Aggregate function: returns the sum of distinct values in the expression.
- Annotations
 - @deprecated
 - Deprecated
 (Since version 3.2.0) Use sum_distinct
- Since
 1.3.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        toDegrees(columnName: String): Column
      
      
      
- Annotations
 - @deprecated
 - Deprecated
 (Since version 2.1.0) Use degrees
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        toDegrees(e: Column): Column
      
      
      
- Annotations
 - @deprecated
 - Deprecated
 (Since version 2.1.0) Use degrees
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        toRadians(columnName: String): Column
      
      
      
- Annotations
 - @deprecated
 - Deprecated
 (Since version 2.1.0) Use radians
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        toRadians(e: Column): Column
      
      
      
- Annotations
 - @deprecated
 - Deprecated
 (Since version 2.1.0) Use radians
- Since
 1.4.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        udf(f: AnyRef, dataType: DataType): UserDefinedFunction
      
      
      
Defines a deterministic user-defined function (UDF) using a Scala closure.
Defines a deterministic user-defined function (UDF) using a Scala closure. For this variant, the caller must specify the output data type, and there is no automatic input type coercion. By default the returned UDF is deterministic. To change it to nondeterministic, call the API
UserDefinedFunction.asNondeterministic().Note that, although the Scala closure can have primitive-type function argument, it doesn't work well with null values. Because the Scala closure is passed in as Any type, there is no type information for the function arguments. Without the type information, Spark may blindly pass null to the Scala closure with primitive-type argument, and the closure will see the default value of the Java type for the null argument, e.g.
udf((x: Int) => x, IntegerType), the result is 0 for null input.- f
 A closure in Scala
- dataType
 The output data type of the UDF
- Annotations
 - @deprecated
 - Deprecated
 (Since version 3.0.0)
- Since
 2.0.0