Class Column

Object
org.apache.spark.sql.Column
All Implemented Interfaces:
org.apache.spark.internal.Logging
Direct Known Subclasses:
ColumnName, TypedColumn

public class Column extends Object implements org.apache.spark.internal.Logging
A column that will be computed based on the data in a DataFrame.

A new column can be constructed based on the input columns present in a DataFrame:


   df("columnName")            // On a specific `df` DataFrame.
   col("columnName")           // A generic column not yet associated with a DataFrame.
   col("columnName.field")     // Extracting a struct field
   col("`a.column.with.dots`") // Escape `.` in column names.
   $"columnName"               // Scala short hand for a named column.
 

Column objects can be composed to form complex expressions:


   $"a" + 1
   $"a" === $"b"
 

Since:
1.3.0
Note:
The internal Catalyst expression can be accessed via expr(), but this method is for debugging purposes only and can change in any future Spark releases.

  • Nested Class Summary

    Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging

    org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
  • Constructor Summary

    Constructors
    Constructor
    Description
    Column(String name)
     
    Column(org.apache.spark.sql.catalyst.expressions.Expression expr)
     
  • Method Summary

    Modifier and Type
    Method
    Description
    alias(String alias)
    Gives the column an alias.
    and(Column other)
    Boolean AND.
    apply(Object extraction)
    Extracts a value or values from a complex type.
    as(String alias)
    Gives the column an alias.
    as(String[] aliases)
    Assigns the given aliases to the results of a table generating function.
    as(String alias, Metadata metadata)
    Gives the column an alias with metadata.
    as(Encoder<U> evidence$1)
    Provides a type hint about the expected return value of this column.
    as(scala.collection.immutable.Seq<String> aliases)
    (Scala-specific) Assigns the given aliases to the results of a table generating function.
    as(scala.Symbol alias)
    Gives the column an alias.
    asc()
    Returns a sort expression based on ascending order of the column.
    Returns a sort expression based on ascending order of the column, and null values return before non-null values.
    Returns a sort expression based on ascending order of the column, and null values appear after non-null values.
    between(Object lowerBound, Object upperBound)
    True if the current column is between the lower bound and upper bound, inclusive.
    Compute bitwise AND of this expression with another expression.
    Compute bitwise OR of this expression with another expression.
    Compute bitwise XOR of this expression with another expression.
    Casts the column to a different data type, using the canonical string representation of the type.
    Casts the column to a different data type.
    Contains the other element.
    Returns a sort expression based on the descending order of the column.
    Returns a sort expression based on the descending order of the column, and null values appear before non-null values.
    Returns a sort expression based on the descending order of the column, and null values appear after non-null values.
    divide(Object other)
    Division this expression by another expression.
    dropFields(scala.collection.immutable.Seq<String> fieldNames)
    An expression that drops fields in StructType by name.
    endsWith(String literal)
    String ends with another string literal.
    String ends with.
    Equality test that is safe for null values.
    boolean
    equals(Object that)
     
    equalTo(Object other)
    Equality test.
    void
    explain(boolean extended)
    Prints the expression to the console for debugging purposes.
    org.apache.spark.sql.catalyst.expressions.Expression
     
    geq(Object other)
    Greater than or equal to an expression.
    getField(String fieldName)
    An expression that gets a field by name in a StructType.
    An expression that gets an item at position ordinal out of an array, or gets a value by key key in a MapType.
    gt(Object other)
    Greater than.
    int
     
    ilike(String literal)
    SQL ILIKE expression (case insensitive LIKE).
    isin(Object... list)
    A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.
    isin(scala.collection.immutable.Seq<Object> list)
    A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.
    A boolean expression that is evaluated to true if the value of this expression is contained by the provided collection.
    isInCollection(scala.collection.Iterable<?> values)
    A boolean expression that is evaluated to true if the value of this expression is contained by the provided collection.
    True if the current expression is NaN.
    True if the current expression is NOT null.
    True if the current expression is null.
    leq(Object other)
    Less than or equal to.
    like(String literal)
    SQL like expression.
    lt(Object other)
    Less than.
    minus(Object other)
    Subtraction.
    mod(Object other)
    Modulo (a.k.a.
    Multiplication of this expression and another expression.
    name(String alias)
    Gives the column a name (alias).
    Inequality test.
    or(Column other)
    Boolean OR.
    Evaluates a list of conditions and returns one of multiple possible result expressions.
    Defines an empty analytic clause.
    over(WindowSpec window)
    Defines a windowing column.
    plus(Object other)
    Sum of this expression and another expression.
    rlike(String literal)
    SQL RLIKE expression (LIKE with Regex).
    startsWith(String literal)
    String starts with another string literal.
    String starts with.
    substr(int startPos, int len)
    An expression that returns a substring.
    substr(Column startPos, Column len)
    An expression that returns a substring.
     
    Casts the column to a different data type and the result is null on failure.
    Casts the column to a different data type and the result is null on failure.
    static scala.Option<org.apache.spark.sql.catalyst.expressions.Expression>
     
    when(Column condition, Object value)
    Evaluates a list of conditions and returns one of multiple possible result expressions.
    withField(String fieldName, Column col)
    An expression that adds/replaces field in StructType by name.

    Methods inherited from class java.lang.Object

    getClass, notify, notifyAll, wait, wait, wait

    Methods inherited from interface org.apache.spark.internal.Logging

    initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
  • Constructor Details

    • Column

      public Column(org.apache.spark.sql.catalyst.expressions.Expression expr)
    • Column

      public Column(String name)
  • Method Details

    • unapply

      public static scala.Option<org.apache.spark.sql.catalyst.expressions.Expression> unapply(Column col)
    • isin

      public Column isin(Object... list)
      A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.

      Note: Since the type of the elements in the list are inferred only during the run time, the elements will be "up-casted" to the most common type for comparison. For eg: 1) In the case of "Int vs String", the "Int" will be up-casted to "String" and the comparison will look like "String vs String". 2) In the case of "Float vs Double", the "Float" will be up-casted to "Double" and the comparison will look like "Double vs Double"

      Parameters:
      list - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.5.0
    • expr

      public org.apache.spark.sql.catalyst.expressions.Expression expr()
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • equals

      public boolean equals(Object that)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • as

      public <U> TypedColumn<Object,U> as(Encoder<U> evidence$1)
      Provides a type hint about the expected return value of this column. This information can be used by operations such as select on a Dataset to automatically convert the results into the correct JVM types.
      Parameters:
      evidence$1 - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.6.0
    • apply

      public Column apply(Object extraction)
      Extracts a value or values from a complex type. The following types of extraction are supported:
      • Given an Array, an integer ordinal can be used to retrieve a single value.
      • Given a Map, a key of the correct type can be used to retrieve an individual value.
      • Given a Struct, a string fieldName can be used to extract that field.
      • Given an Array of Structs, a string fieldName can be used to extract filed of every struct in that array, and return an Array of fields.
      Parameters:
      extraction - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.4.0
    • equalTo

      public Column equalTo(Object other)
      Equality test.
      
         // Scala:
         df.filter( df("colA") === df("colB") )
      
         // Java
         import static org.apache.spark.sql.functions.*;
         df.filter( col("colA").equalTo(col("colB")) );
       

      Parameters:
      other - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • notEqual

      public Column notEqual(Object other)
      Inequality test.
      
         // Scala:
         df.select( df("colA") !== df("colB") )
         df.select( !(df("colA") === df("colB")) )
      
         // Java:
         import static org.apache.spark.sql.functions.*;
         df.filter( col("colA").notEqual(col("colB")) );
       

      Parameters:
      other - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • gt

      public Column gt(Object other)
      Greater than.
      
         // Scala: The following selects people older than 21.
         people.select( people("age") > lit(21) )
      
         // Java:
         import static org.apache.spark.sql.functions.*;
         people.select( people.col("age").gt(21) );
       

      Parameters:
      other - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • lt

      public Column lt(Object other)
      Less than.
      
         // Scala: The following selects people younger than 21.
         people.select( people("age") < 21 )
      
         // Java:
         people.select( people.col("age").lt(21) );
       

      Parameters:
      other - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • leq

      public Column leq(Object other)
      Less than or equal to.
      
         // Scala: The following selects people age 21 or younger than 21.
         people.select( people("age") <= 21 )
      
         // Java:
         people.select( people.col("age").leq(21) );
       

      Parameters:
      other - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • geq

      public Column geq(Object other)
      Greater than or equal to an expression.
      
         // Scala: The following selects people age 21 or older than 21.
         people.select( people("age") >= 21 )
      
         // Java:
         people.select( people.col("age").geq(21) )
       

      Parameters:
      other - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • eqNullSafe

      public Column eqNullSafe(Object other)
      Equality test that is safe for null values.

      Parameters:
      other - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • when

      public Column when(Column condition, Object value)
      Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.

      
         // Example: encoding gender string column into integer.
      
         // Scala:
         people.select(when(people("gender") === "male", 0)
           .when(people("gender") === "female", 1)
           .otherwise(2))
      
         // Java:
         people.select(when(col("gender").equalTo("male"), 0)
           .when(col("gender").equalTo("female"), 1)
           .otherwise(2))
       

      Parameters:
      condition - (undocumented)
      value - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.4.0
    • otherwise

      public Column otherwise(Object value)
      Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.

      
         // Example: encoding gender string column into integer.
      
         // Scala:
         people.select(when(people("gender") === "male", 0)
           .when(people("gender") === "female", 1)
           .otherwise(2))
      
         // Java:
         people.select(when(col("gender").equalTo("male"), 0)
           .when(col("gender").equalTo("female"), 1)
           .otherwise(2))
       

      Parameters:
      value - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.4.0
    • between

      public Column between(Object lowerBound, Object upperBound)
      True if the current column is between the lower bound and upper bound, inclusive.

      Parameters:
      lowerBound - (undocumented)
      upperBound - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.4.0
    • isNaN

      public Column isNaN()
      True if the current expression is NaN.

      Returns:
      (undocumented)
      Since:
      1.5.0
    • isNull

      public Column isNull()
      True if the current expression is null.

      Returns:
      (undocumented)
      Since:
      1.3.0
    • isNotNull

      public Column isNotNull()
      True if the current expression is NOT null.

      Returns:
      (undocumented)
      Since:
      1.3.0
    • or

      public Column or(Column other)
      Boolean OR.
      
         // Scala: The following selects people that are in school or employed.
         people.filter( people("inSchool") || people("isEmployed") )
      
         // Java:
         people.filter( people.col("inSchool").or(people.col("isEmployed")) );
       

      Parameters:
      other - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • and

      public Column and(Column other)
      Boolean AND.
      
         // Scala: The following selects people that are in school and employed at the same time.
         people.select( people("inSchool") && people("isEmployed") )
      
         // Java:
         people.select( people.col("inSchool").and(people.col("isEmployed")) );
       

      Parameters:
      other - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • plus

      public Column plus(Object other)
      Sum of this expression and another expression.
      
         // Scala: The following selects the sum of a person's height and weight.
         people.select( people("height") + people("weight") )
      
         // Java:
         people.select( people.col("height").plus(people.col("weight")) );
       

      Parameters:
      other - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • minus

      public Column minus(Object other)
      Subtraction. Subtract the other expression from this expression.
      
         // Scala: The following selects the difference between people's height and their weight.
         people.select( people("height") - people("weight") )
      
         // Java:
         people.select( people.col("height").minus(people.col("weight")) );
       

      Parameters:
      other - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • multiply

      public Column multiply(Object other)
      Multiplication of this expression and another expression.
      
         // Scala: The following multiplies a person's height by their weight.
         people.select( people("height") * people("weight") )
      
         // Java:
         people.select( people.col("height").multiply(people.col("weight")) );
       

      Parameters:
      other - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • divide

      public Column divide(Object other)
      Division this expression by another expression.
      
         // Scala: The following divides a person's height by their weight.
         people.select( people("height") / people("weight") )
      
         // Java:
         people.select( people.col("height").divide(people.col("weight")) );
       

      Parameters:
      other - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • mod

      public Column mod(Object other)
      Modulo (a.k.a. remainder) expression.

      Parameters:
      other - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • isin

      public Column isin(scala.collection.immutable.Seq<Object> list)
      A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.

      Note: Since the type of the elements in the list are inferred only during the run time, the elements will be "up-casted" to the most common type for comparison. For eg: 1) In the case of "Int vs String", the "Int" will be up-casted to "String" and the comparison will look like "String vs String". 2) In the case of "Float vs Double", the "Float" will be up-casted to "Double" and the comparison will look like "Double vs Double"

      Parameters:
      list - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.5.0
    • isInCollection

      public Column isInCollection(scala.collection.Iterable<?> values)
      A boolean expression that is evaluated to true if the value of this expression is contained by the provided collection.

      Note: Since the type of the elements in the collection are inferred only during the run time, the elements will be "up-casted" to the most common type for comparison. For eg: 1) In the case of "Int vs String", the "Int" will be up-casted to "String" and the comparison will look like "String vs String". 2) In the case of "Float vs Double", the "Float" will be up-casted to "Double" and the comparison will look like "Double vs Double"

      Parameters:
      values - (undocumented)
      Returns:
      (undocumented)
      Since:
      2.4.0
    • isInCollection

      public Column isInCollection(Iterable<?> values)
      A boolean expression that is evaluated to true if the value of this expression is contained by the provided collection.

      Note: Since the type of the elements in the collection are inferred only during the run time, the elements will be "up-casted" to the most common type for comparison. For eg: 1) In the case of "Int vs String", the "Int" will be up-casted to "String" and the comparison will look like "String vs String". 2) In the case of "Float vs Double", the "Float" will be up-casted to "Double" and the comparison will look like "Double vs Double"

      Parameters:
      values - (undocumented)
      Returns:
      (undocumented)
      Since:
      2.4.0
    • like

      public Column like(String literal)
      SQL like expression. Returns a boolean column based on a SQL LIKE match.

      Parameters:
      literal - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • rlike

      public Column rlike(String literal)
      SQL RLIKE expression (LIKE with Regex). Returns a boolean column based on a regex match.

      Parameters:
      literal - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • ilike

      public Column ilike(String literal)
      SQL ILIKE expression (case insensitive LIKE).

      Parameters:
      literal - (undocumented)
      Returns:
      (undocumented)
      Since:
      3.3.0
    • getItem

      public Column getItem(Object key)
      An expression that gets an item at position ordinal out of an array, or gets a value by key key in a MapType.

      Parameters:
      key - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • withField

      public Column withField(String fieldName, Column col)
      An expression that adds/replaces field in StructType by name.

      
         val df = sql("SELECT named_struct('a', 1, 'b', 2) struct_col")
         df.select($"struct_col".withField("c", lit(3)))
         // result: {"a":1,"b":2,"c":3}
      
         val df = sql("SELECT named_struct('a', 1, 'b', 2) struct_col")
         df.select($"struct_col".withField("b", lit(3)))
         // result: {"a":1,"b":3}
      
         val df = sql("SELECT CAST(NULL AS struct<a:int,b:int>) struct_col")
         df.select($"struct_col".withField("c", lit(3)))
         // result: null of type struct<a:int,b:int,c:int>
      
         val df = sql("SELECT named_struct('a', 1, 'b', 2, 'b', 3) struct_col")
         df.select($"struct_col".withField("b", lit(100)))
         // result: {"a":1,"b":100,"b":100}
      
         val df = sql("SELECT named_struct('a', named_struct('a', 1, 'b', 2)) struct_col")
         df.select($"struct_col".withField("a.c", lit(3)))
         // result: {"a":{"a":1,"b":2,"c":3}}
      
         val df = sql("SELECT named_struct('a', named_struct('b', 1), 'a', named_struct('c', 2)) struct_col")
         df.select($"struct_col".withField("a.c", lit(3)))
         // result: org.apache.spark.sql.AnalysisException: Ambiguous reference to fields
       

      This method supports adding/replacing nested fields directly e.g.

      
         val df = sql("SELECT named_struct('a', named_struct('a', 1, 'b', 2)) struct_col")
         df.select($"struct_col".withField("a.c", lit(3)).withField("a.d", lit(4)))
         // result: {"a":{"a":1,"b":2,"c":3,"d":4}}
       

      However, if you are going to add/replace multiple nested fields, it is more optimal to extract out the nested struct before adding/replacing multiple fields e.g.

      
         val df = sql("SELECT named_struct('a', named_struct('a', 1, 'b', 2)) struct_col")
         df.select($"struct_col".withField("a", $"struct_col.a".withField("c", lit(3)).withField("d", lit(4))))
         // result: {"a":{"a":1,"b":2,"c":3,"d":4}}
       

      Parameters:
      fieldName - (undocumented)
      col - (undocumented)
      Returns:
      (undocumented)
      Since:
      3.1.0
    • dropFields

      public Column dropFields(scala.collection.immutable.Seq<String> fieldNames)
      An expression that drops fields in StructType by name. This is a no-op if schema doesn't contain field name(s).

      
         val df = sql("SELECT named_struct('a', 1, 'b', 2) struct_col")
         df.select($"struct_col".dropFields("b"))
         // result: {"a":1}
      
         val df = sql("SELECT named_struct('a', 1, 'b', 2) struct_col")
         df.select($"struct_col".dropFields("c"))
         // result: {"a":1,"b":2}
      
         val df = sql("SELECT named_struct('a', 1, 'b', 2, 'c', 3) struct_col")
         df.select($"struct_col".dropFields("b", "c"))
         // result: {"a":1}
      
         val df = sql("SELECT named_struct('a', 1, 'b', 2) struct_col")
         df.select($"struct_col".dropFields("a", "b"))
         // result: org.apache.spark.sql.AnalysisException: [DATATYPE_MISMATCH.CANNOT_DROP_ALL_FIELDS] Cannot resolve "update_fields(struct_col, dropfield(), dropfield())" due to data type mismatch: Cannot drop all fields in struct.;
      
         val df = sql("SELECT CAST(NULL AS struct<a:int,b:int>) struct_col")
         df.select($"struct_col".dropFields("b"))
         // result: null of type struct<a:int>
      
         val df = sql("SELECT named_struct('a', 1, 'b', 2, 'b', 3) struct_col")
         df.select($"struct_col".dropFields("b"))
         // result: {"a":1}
      
         val df = sql("SELECT named_struct('a', named_struct('a', 1, 'b', 2)) struct_col")
         df.select($"struct_col".dropFields("a.b"))
         // result: {"a":{"a":1}}
      
         val df = sql("SELECT named_struct('a', named_struct('b', 1), 'a', named_struct('c', 2)) struct_col")
         df.select($"struct_col".dropFields("a.c"))
         // result: org.apache.spark.sql.AnalysisException: Ambiguous reference to fields
       

      This method supports dropping multiple nested fields directly e.g.

      
         val df = sql("SELECT named_struct('a', named_struct('a', 1, 'b', 2)) struct_col")
         df.select($"struct_col".dropFields("a.b", "a.c"))
         // result: {"a":{"a":1}}
       

      However, if you are going to drop multiple nested fields, it is more optimal to extract out the nested struct before dropping multiple fields from it e.g.

      
         val df = sql("SELECT named_struct('a', named_struct('a', 1, 'b', 2)) struct_col")
         df.select($"struct_col".withField("a", $"struct_col.a".dropFields("b", "c")))
         // result: {"a":{"a":1}}
       

      Parameters:
      fieldNames - (undocumented)
      Returns:
      (undocumented)
      Since:
      3.1.0
    • getField

      public Column getField(String fieldName)
      An expression that gets a field by name in a StructType.

      Parameters:
      fieldName - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • substr

      public Column substr(Column startPos, Column len)
      An expression that returns a substring.
      Parameters:
      startPos - expression for the starting position.
      len - expression for the length of the substring.

      Returns:
      (undocumented)
      Since:
      1.3.0
    • substr

      public Column substr(int startPos, int len)
      An expression that returns a substring.
      Parameters:
      startPos - starting position.
      len - length of the substring.

      Returns:
      (undocumented)
      Since:
      1.3.0
    • contains

      public Column contains(Object other)
      Contains the other element. Returns a boolean column based on a string match.

      Parameters:
      other - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • startsWith

      public Column startsWith(Column other)
      String starts with. Returns a boolean column based on a string match.

      Parameters:
      other - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • startsWith

      public Column startsWith(String literal)
      String starts with another string literal. Returns a boolean column based on a string match.

      Parameters:
      literal - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • endsWith

      public Column endsWith(Column other)
      String ends with. Returns a boolean column based on a string match.

      Parameters:
      other - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • endsWith

      public Column endsWith(String literal)
      String ends with another string literal. Returns a boolean column based on a string match.

      Parameters:
      literal - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • alias

      public Column alias(String alias)
      Gives the column an alias. Same as as.
      
         // Renames colA to colB in select output.
         df.select($"colA".alias("colB"))
       

      Parameters:
      alias - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.4.0
    • as

      public Column as(String alias)
      Gives the column an alias.
      
         // Renames colA to colB in select output.
         df.select($"colA".as("colB"))
       

      If the current column has metadata associated with it, this metadata will be propagated to the new column. If this not desired, use the API as(alias: String, metadata: Metadata) with explicit metadata.

      Parameters:
      alias - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • as

      public Column as(scala.collection.immutable.Seq<String> aliases)
      (Scala-specific) Assigns the given aliases to the results of a table generating function.
      
         // Renames colA to colB in select output.
         df.select(explode($"myMap").as("key" :: "value" :: Nil))
       

      Parameters:
      aliases - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.4.0
    • as

      public Column as(String[] aliases)
      Assigns the given aliases to the results of a table generating function.
      
         // Renames colA to colB in select output.
         df.select(explode($"myMap").as("key" :: "value" :: Nil))
       

      Parameters:
      aliases - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.4.0
    • as

      public Column as(scala.Symbol alias)
      Gives the column an alias.
      
         // Renames colA to colB in select output.
         df.select($"colA".as("colB"))
       

      If the current column has metadata associated with it, this metadata will be propagated to the new column. If this not desired, use the API as(alias: String, metadata: Metadata) with explicit metadata.

      Parameters:
      alias - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • as

      public Column as(String alias, Metadata metadata)
      Gives the column an alias with metadata.
      
         val metadata: Metadata = ...
         df.select($"colA".as("colB", metadata))
       

      Parameters:
      alias - (undocumented)
      metadata - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • name

      public Column name(String alias)
      Gives the column a name (alias).
      
         // Renames colA to colB in select output.
         df.select($"colA".name("colB"))
       

      If the current column has metadata associated with it, this metadata will be propagated to the new column. If this not desired, use the API as(alias: String, metadata: Metadata) with explicit metadata.

      Parameters:
      alias - (undocumented)
      Returns:
      (undocumented)
      Since:
      2.0.0
    • cast

      public Column cast(DataType to)
      Casts the column to a different data type.
      
         // Casts colA to IntegerType.
         import org.apache.spark.sql.types.IntegerType
         df.select(df("colA").cast(IntegerType))
      
         // equivalent to
         df.select(df("colA").cast("int"))
       

      Parameters:
      to - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • cast

      public Column cast(String to)
      Casts the column to a different data type, using the canonical string representation of the type. The supported types are: string, boolean, byte, short, int, long, float, double, decimal, date, timestamp.
      
         // Casts colA to integer.
         df.select(df("colA").cast("int"))
       

      Parameters:
      to - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.3.0
    • try_cast

      public Column try_cast(DataType to)
      Casts the column to a different data type and the result is null on failure.
      
         // Casts colA to IntegerType.
         import org.apache.spark.sql.types.IntegerType
         df.select(df("colA").try_cast(IntegerType))
      
         // equivalent to
         df.select(df("colA").try_cast("int"))
       

      Parameters:
      to - (undocumented)
      Returns:
      (undocumented)
      Since:
      4.0.0
    • try_cast

      public Column try_cast(String to)
      Casts the column to a different data type and the result is null on failure.
      
         // Casts colA to integer.
         df.select(df("colA").try_cast("int"))
       

      Parameters:
      to - (undocumented)
      Returns:
      (undocumented)
      Since:
      4.0.0
    • desc

      public Column desc()
      Returns a sort expression based on the descending order of the column.
      
         // Scala
         df.sort(df("age").desc)
      
         // Java
         df.sort(df.col("age").desc());
       

      Returns:
      (undocumented)
      Since:
      1.3.0
    • desc_nulls_first

      public Column desc_nulls_first()
      Returns a sort expression based on the descending order of the column, and null values appear before non-null values.
      
         // Scala: sort a DataFrame by age column in descending order and null values appearing first.
         df.sort(df("age").desc_nulls_first)
      
         // Java
         df.sort(df.col("age").desc_nulls_first());
       

      Returns:
      (undocumented)
      Since:
      2.1.0
    • desc_nulls_last

      public Column desc_nulls_last()
      Returns a sort expression based on the descending order of the column, and null values appear after non-null values.
      
         // Scala: sort a DataFrame by age column in descending order and null values appearing last.
         df.sort(df("age").desc_nulls_last)
      
         // Java
         df.sort(df.col("age").desc_nulls_last());
       

      Returns:
      (undocumented)
      Since:
      2.1.0
    • asc

      public Column asc()
      Returns a sort expression based on ascending order of the column.
      
         // Scala: sort a DataFrame by age column in ascending order.
         df.sort(df("age").asc)
      
         // Java
         df.sort(df.col("age").asc());
       

      Returns:
      (undocumented)
      Since:
      1.3.0
    • asc_nulls_first

      public Column asc_nulls_first()
      Returns a sort expression based on ascending order of the column, and null values return before non-null values.
      
         // Scala: sort a DataFrame by age column in ascending order and null values appearing first.
         df.sort(df("age").asc_nulls_first)
      
         // Java
         df.sort(df.col("age").asc_nulls_first());
       

      Returns:
      (undocumented)
      Since:
      2.1.0
    • asc_nulls_last

      public Column asc_nulls_last()
      Returns a sort expression based on ascending order of the column, and null values appear after non-null values.
      
         // Scala: sort a DataFrame by age column in ascending order and null values appearing last.
         df.sort(df("age").asc_nulls_last)
      
         // Java
         df.sort(df.col("age").asc_nulls_last());
       

      Returns:
      (undocumented)
      Since:
      2.1.0
    • explain

      public void explain(boolean extended)
      Prints the expression to the console for debugging purposes.

      Parameters:
      extended - (undocumented)
      Since:
      1.3.0
    • bitwiseOR

      public Column bitwiseOR(Object other)
      Compute bitwise OR of this expression with another expression.
      
         df.select($"colA".bitwiseOR($"colB"))
       

      Parameters:
      other - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.4.0
    • bitwiseAND

      public Column bitwiseAND(Object other)
      Compute bitwise AND of this expression with another expression.
      
         df.select($"colA".bitwiseAND($"colB"))
       

      Parameters:
      other - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.4.0
    • bitwiseXOR

      public Column bitwiseXOR(Object other)
      Compute bitwise XOR of this expression with another expression.
      
         df.select($"colA".bitwiseXOR($"colB"))
       

      Parameters:
      other - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.4.0
    • over

      public Column over(WindowSpec window)
      Defines a windowing column.

      
         val w = Window.partitionBy("name").orderBy("id")
         df.select(
           sum("price").over(w.rangeBetween(Window.unboundedPreceding, 2)),
           avg("price").over(w.rowsBetween(Window.currentRow, 4))
         )
       

      Parameters:
      window - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.4.0
    • over

      public Column over()
      Defines an empty analytic clause. In this case the analytic function is applied and presented for all rows in the result set.

      
         df.select(
           sum("price").over(),
           avg("price").over()
         )
       

      Returns:
      (undocumented)
      Since:
      2.0.0