class KeyValueGroupedDataset[K, V] extends Serializable
A Dataset has been logically grouped by a user specified grouping key.  Users should not
construct a KeyValueGroupedDataset directly, but should instead call groupByKey on
an existing Dataset.
- Source
 - KeyValueGroupedDataset.scala
 - Since
 2.0.0
- Alphabetic
 - By Inheritance
 
- KeyValueGroupedDataset
 - Serializable
 - Serializable
 - AnyRef
 - Any
 
- Hide All
 - Show All
 
- Public
 - All
 
Value Members
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        !=(arg0: Any): Boolean
      
      
      
- Definition Classes
 - AnyRef → Any
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        ##(): Int
      
      
      
- Definition Classes
 - AnyRef → Any
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        ==(arg0: Any): Boolean
      
      
      
- Definition Classes
 - AnyRef → Any
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        agg[U1, U2, U3, U4, U5, U6, U7, U8](col1: TypedColumn[V, U1], col2: TypedColumn[V, U2], col3: TypedColumn[V, U3], col4: TypedColumn[V, U4], col5: TypedColumn[V, U5], col6: TypedColumn[V, U6], col7: TypedColumn[V, U7], col8: TypedColumn[V, U8]): Dataset[(K, U1, U2, U3, U4, U5, U6, U7, U8)]
      
      
      
Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        agg[U1, U2, U3, U4, U5, U6, U7](col1: TypedColumn[V, U1], col2: TypedColumn[V, U2], col3: TypedColumn[V, U3], col4: TypedColumn[V, U4], col5: TypedColumn[V, U5], col6: TypedColumn[V, U6], col7: TypedColumn[V, U7]): Dataset[(K, U1, U2, U3, U4, U5, U6, U7)]
      
      
      
Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        agg[U1, U2, U3, U4, U5, U6](col1: TypedColumn[V, U1], col2: TypedColumn[V, U2], col3: TypedColumn[V, U3], col4: TypedColumn[V, U4], col5: TypedColumn[V, U5], col6: TypedColumn[V, U6]): Dataset[(K, U1, U2, U3, U4, U5, U6)]
      
      
      
Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        agg[U1, U2, U3, U4, U5](col1: TypedColumn[V, U1], col2: TypedColumn[V, U2], col3: TypedColumn[V, U3], col4: TypedColumn[V, U4], col5: TypedColumn[V, U5]): Dataset[(K, U1, U2, U3, U4, U5)]
      
      
      
Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
- Since
 3.0.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        agg[U1, U2, U3, U4](col1: TypedColumn[V, U1], col2: TypedColumn[V, U2], col3: TypedColumn[V, U3], col4: TypedColumn[V, U4]): Dataset[(K, U1, U2, U3, U4)]
      
      
      
Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        agg[U1, U2, U3](col1: TypedColumn[V, U1], col2: TypedColumn[V, U2], col3: TypedColumn[V, U3]): Dataset[(K, U1, U2, U3)]
      
      
      
Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        agg[U1, U2](col1: TypedColumn[V, U1], col2: TypedColumn[V, U2]): Dataset[(K, U1, U2)]
      
      
      
Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        agg[U1](col1: TypedColumn[V, U1]): Dataset[(K, U1)]
      
      
      
Computes the given aggregation, returning a Dataset of tuples for each unique key and the result of computing this aggregation over all elements in the group.
Computes the given aggregation, returning a Dataset of tuples for each unique key and the result of computing this aggregation over all elements in the group.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        aggUntyped(columns: TypedColumn[_, _]*): Dataset[_]
      
      
      
Internal helper function for building typed aggregations that return tuples.
Internal helper function for building typed aggregations that return tuples. For simplicity and code reuse, we do this without the help of the type system and then use helper functions that cast appropriately for the user facing interface.
- Attributes
 - protected
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        asInstanceOf[T0]: T0
      
      
      
- Definition Classes
 - Any
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        clone(): AnyRef
      
      
      
- Attributes
 - protected[lang]
 - Definition Classes
 - AnyRef
 - Annotations
 - @throws( ... ) @native()
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        cogroup[U, R](other: KeyValueGroupedDataset[K, U], f: CoGroupFunction[K, V, U, R], encoder: Encoder[R]): Dataset[R]
      
      
      
(Java-specific) Applies the given function to each cogrouped data.
(Java-specific) Applies the given function to each cogrouped data. For each unique group, the function will be passed the grouping key and 2 iterators containing all elements in the group from Dataset
thisandother. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        cogroup[U, R](other: KeyValueGroupedDataset[K, U])(f: (K, Iterator[V], Iterator[U]) ⇒ TraversableOnce[R])(implicit arg0: Encoder[R]): Dataset[R]
      
      
      
(Scala-specific) Applies the given function to each cogrouped data.
(Scala-specific) Applies the given function to each cogrouped data. For each unique group, the function will be passed the grouping key and 2 iterators containing all elements in the group from Dataset
thisandother. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        cogroupSorted[U, R](other: KeyValueGroupedDataset[K, U], thisSortExprs: Array[Column], otherSortExprs: Array[Column], f: CoGroupFunction[K, V, U, R], encoder: Encoder[R]): Dataset[R]
      
      
      
(Java-specific) Applies the given function to each sorted cogrouped data.
(Java-specific) Applies the given function to each sorted cogrouped data. For each unique group, the function will be passed the grouping key and 2 sorted iterators containing all elements in the group from Dataset
thisandother. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.This is equivalent to KeyValueGroupedDataset#cogroup, except for the iterators to be sorted according to the given sort expressions. That sorting does not add computational complexity.
- Since
 3.4.0
- See also
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        cogroupSorted[U, R](other: KeyValueGroupedDataset[K, U])(thisSortExprs: Column*)(otherSortExprs: Column*)(f: (K, Iterator[V], Iterator[U]) ⇒ TraversableOnce[R])(implicit arg0: Encoder[R]): Dataset[R]
      
      
      
(Scala-specific) Applies the given function to each sorted cogrouped data.
(Scala-specific) Applies the given function to each sorted cogrouped data. For each unique group, the function will be passed the grouping key and 2 sorted iterators containing all elements in the group from Dataset
thisandother. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.This is equivalent to KeyValueGroupedDataset#cogroup, except for the iterators to be sorted according to the given sort expressions. That sorting does not add computational complexity.
- Since
 3.4.0
- See also
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        count(): Dataset[(K, Long)]
      
      
      
Returns a Dataset that contains a tuple with each key and the number of items present for that key.
Returns a Dataset that contains a tuple with each key and the number of items present for that key.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        eq(arg0: AnyRef): Boolean
      
      
      
- Definition Classes
 - AnyRef
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        equals(arg0: Any): Boolean
      
      
      
- Definition Classes
 - AnyRef → Any
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        finalize(): Unit
      
      
      
- Attributes
 - protected[lang]
 - Definition Classes
 - AnyRef
 - Annotations
 - @throws( classOf[java.lang.Throwable] )
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        flatMapGroups[U](f: FlatMapGroupsFunction[K, V, U], encoder: Encoder[U]): Dataset[U]
      
      
      
(Java-specific) Applies the given function to each group of data.
(Java-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.
This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an
org.apache.spark.sql.expressions#Aggregator.Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling
toList) unless they are sure that this is possible given the memory constraints of their cluster.- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        flatMapGroups[U](f: (K, Iterator[V]) ⇒ TraversableOnce[U])(implicit arg0: Encoder[U]): Dataset[U]
      
      
      
(Scala-specific) Applies the given function to each group of data.
(Scala-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.
This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an
org.apache.spark.sql.expressions#Aggregator.Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling
toList) unless they are sure that this is possible given the memory constraints of their cluster.- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        flatMapGroupsWithState[S, U](func: FlatMapGroupsWithStateFunction[K, V, S, U], outputMode: OutputMode, stateEncoder: Encoder[S], outputEncoder: Encoder[U], timeoutConf: GroupStateTimeout, initialState: KeyValueGroupedDataset[K, S]): Dataset[U]
      
      
      
(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.
(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See
GroupStatefor more details.- S
 The type of the user-defined state. Must be encodable to Spark SQL types.
- U
 The type of the output objects. Must be encodable to Spark SQL types.
- func
 Function to be called on every group.
- outputMode
 The output mode of the function.
- stateEncoder
 Encoder for the state type.
- outputEncoder
 Encoder for the output type.
- timeoutConf
 Timeout configuration for groups that do not receive data for a while.
- initialState
 The user provided state that will be initialized when the first batch of data is processed in the streaming query. The user defined function will be called on the state data even if there are no other values in the group. To covert a Dataset
dsof type of typeDataset[(K, S)]to aKeyValueGroupedDataset[K, S], useds.groupByKey(x => x._1).mapValues(_._2)See Encoder for more details on what types are encodable to Spark SQL.
- Since
 3.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        flatMapGroupsWithState[S, U](func: FlatMapGroupsWithStateFunction[K, V, S, U], outputMode: OutputMode, stateEncoder: Encoder[S], outputEncoder: Encoder[U], timeoutConf: GroupStateTimeout): Dataset[U]
      
      
      
(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.
(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See
GroupStatefor more details.- S
 The type of the user-defined state. Must be encodable to Spark SQL types.
- U
 The type of the output objects. Must be encodable to Spark SQL types.
- func
 Function to be called on every group.
- outputMode
 The output mode of the function.
- stateEncoder
 Encoder for the state type.
- outputEncoder
 Encoder for the output type.
- timeoutConf
 Timeout configuration for groups that do not receive data for a while. See Encoder for more details on what types are encodable to Spark SQL.
- Since
 2.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        flatMapGroupsWithState[S, U](outputMode: OutputMode, timeoutConf: GroupStateTimeout, initialState: KeyValueGroupedDataset[K, S])(func: (K, Iterator[V], GroupState[S]) ⇒ Iterator[U])(implicit arg0: Encoder[S], arg1: Encoder[U]): Dataset[U]
      
      
      
(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.
(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See
GroupStatefor more details.- S
 The type of the user-defined state. Must be encodable to Spark SQL types.
- U
 The type of the output objects. Must be encodable to Spark SQL types.
- outputMode
 The output mode of the function.
- timeoutConf
 Timeout configuration for groups that do not receive data for a while.
- initialState
 The user provided state that will be initialized when the first batch of data is processed in the streaming query. The user defined function will be called on the state data even if there are no other values in the group. To covert a Dataset
dsof type of typeDataset[(K, S)]to aKeyValueGroupedDataset[K, S], useds.groupByKey(x => x._1).mapValues(_._2)See Encoder for more details on what types are encodable to Spark SQL.
- func
 Function to be called on every group.
- Since
 3.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        flatMapGroupsWithState[S, U](outputMode: OutputMode, timeoutConf: GroupStateTimeout)(func: (K, Iterator[V], GroupState[S]) ⇒ Iterator[U])(implicit arg0: Encoder[S], arg1: Encoder[U]): Dataset[U]
      
      
      
(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.
(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See
GroupStatefor more details.- S
 The type of the user-defined state. Must be encodable to Spark SQL types.
- U
 The type of the output objects. Must be encodable to Spark SQL types.
- outputMode
 The output mode of the function.
- timeoutConf
 Timeout configuration for groups that do not receive data for a while. See Encoder for more details on what types are encodable to Spark SQL.
- func
 Function to be called on every group.
- Since
 2.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        flatMapSortedGroups[U](SortExprs: Array[Column], f: FlatMapGroupsFunction[K, V, U], encoder: Encoder[U]): Dataset[U]
      
      
      
(Java-specific) Applies the given function to each group of data.
(Java-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and a sorted iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.
This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an
org.apache.spark.sql.expressions#Aggregator.Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling
toList) unless they are sure that this is possible given the memory constraints of their cluster.This is equivalent to KeyValueGroupedDataset#flatMapGroups, except for the iterator to be sorted according to the given sort expressions. That sorting does not add computational complexity.
- Since
 3.4.0
- See also
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        flatMapSortedGroups[U](sortExprs: Column*)(f: (K, Iterator[V]) ⇒ TraversableOnce[U])(implicit arg0: Encoder[U]): Dataset[U]
      
      
      
(Scala-specific) Applies the given function to each group of data.
(Scala-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and a sorted iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.
This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an
org.apache.spark.sql.expressions#Aggregator.Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling
toList) unless they are sure that this is possible given the memory constraints of their cluster.This is equivalent to KeyValueGroupedDataset#flatMapGroups, except for the iterator to be sorted according to the given sort expressions. That sorting does not add computational complexity.
- Since
 3.4.0
- See also
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        getClass(): Class[_]
      
      
      
- Definition Classes
 - AnyRef → Any
 - Annotations
 - @native()
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hashCode(): Int
      
      
      
- Definition Classes
 - AnyRef → Any
 - Annotations
 - @native()
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        isInstanceOf[T0]: Boolean
      
      
      
- Definition Classes
 - Any
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        keyAs[L](implicit arg0: Encoder[L]): KeyValueGroupedDataset[L, V]
      
      
      
Returns a new KeyValueGroupedDataset where the type of the key has been mapped to the specified type.
Returns a new KeyValueGroupedDataset where the type of the key has been mapped to the specified type. The mapping of key columns to the type follows the same rules as
ason Dataset.- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        keys: Dataset[K]
      
      
      
Returns a Dataset that contains each unique key.
Returns a Dataset that contains each unique key. This is equivalent to doing mapping over the Dataset to extract the keys and then running a distinct operation on those.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        mapGroups[U](f: MapGroupsFunction[K, V, U], encoder: Encoder[U]): Dataset[U]
      
      
      
(Java-specific) Applies the given function to each group of data.
(Java-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an element of arbitrary type which will be returned as a new Dataset.
This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an
org.apache.spark.sql.expressions#Aggregator.Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling
toList) unless they are sure that this is possible given the memory constraints of their cluster.- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        mapGroups[U](f: (K, Iterator[V]) ⇒ U)(implicit arg0: Encoder[U]): Dataset[U]
      
      
      
(Scala-specific) Applies the given function to each group of data.
(Scala-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an element of arbitrary type which will be returned as a new Dataset.
This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an
org.apache.spark.sql.expressions#Aggregator.Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling
toList) unless they are sure that this is possible given the memory constraints of their cluster.- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        mapGroupsWithState[S, U](func: MapGroupsWithStateFunction[K, V, S, U], stateEncoder: Encoder[S], outputEncoder: Encoder[U], timeoutConf: GroupStateTimeout, initialState: KeyValueGroupedDataset[K, S]): Dataset[U]
      
      
      
(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.
(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See
GroupStatefor more details.- S
 The type of the user-defined state. Must be encodable to Spark SQL types.
- U
 The type of the output objects. Must be encodable to Spark SQL types.
- func
 Function to be called on every group.
- stateEncoder
 Encoder for the state type.
- outputEncoder
 Encoder for the output type.
- timeoutConf
 Timeout configuration for groups that do not receive data for a while.
- initialState
 The user provided state that will be initialized when the first batch of data is processed in the streaming query. The user defined function will be called on the state data even if there are no other values in the group. See Encoder for more details on what types are encodable to Spark SQL.
- Since
 3.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        mapGroupsWithState[S, U](func: MapGroupsWithStateFunction[K, V, S, U], stateEncoder: Encoder[S], outputEncoder: Encoder[U], timeoutConf: GroupStateTimeout): Dataset[U]
      
      
      
(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.
(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See
GroupStatefor more details.- S
 The type of the user-defined state. Must be encodable to Spark SQL types.
- U
 The type of the output objects. Must be encodable to Spark SQL types.
- func
 Function to be called on every group.
- stateEncoder
 Encoder for the state type.
- outputEncoder
 Encoder for the output type.
- timeoutConf
 Timeout configuration for groups that do not receive data for a while. See Encoder for more details on what types are encodable to Spark SQL.
- Since
 2.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        mapGroupsWithState[S, U](func: MapGroupsWithStateFunction[K, V, S, U], stateEncoder: Encoder[S], outputEncoder: Encoder[U]): Dataset[U]
      
      
      
(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.
(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See
GroupStatefor more details.- S
 The type of the user-defined state. Must be encodable to Spark SQL types.
- U
 The type of the output objects. Must be encodable to Spark SQL types.
- func
 Function to be called on every group.
- stateEncoder
 Encoder for the state type.
- outputEncoder
 Encoder for the output type. See Encoder for more details on what types are encodable to Spark SQL.
- Since
 2.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        mapGroupsWithState[S, U](timeoutConf: GroupStateTimeout, initialState: KeyValueGroupedDataset[K, S])(func: (K, Iterator[V], GroupState[S]) ⇒ U)(implicit arg0: Encoder[S], arg1: Encoder[U]): Dataset[U]
      
      
      
(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.
(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See org.apache.spark.sql.streaming.GroupState for more details.
- S
 The type of the user-defined state. Must be encodable to Spark SQL types.
- U
 The type of the output objects. Must be encodable to Spark SQL types.
- timeoutConf
 Timeout Conf, see GroupStateTimeout for more details
- initialState
 The user provided state that will be initialized when the first batch of data is processed in the streaming query. The user defined function will be called on the state data even if there are no other values in the group. To convert a Dataset ds of type Dataset[(K, S)] to a KeyValueGroupedDataset[K, S] do
ds.groupByKey(x => x._1).mapValues(_._2)See Encoder for more details on what types are encodable to Spark SQL.
- func
 Function to be called on every group.
- Since
 3.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        mapGroupsWithState[S, U](timeoutConf: GroupStateTimeout)(func: (K, Iterator[V], GroupState[S]) ⇒ U)(implicit arg0: Encoder[S], arg1: Encoder[U]): Dataset[U]
      
      
      
(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.
(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See org.apache.spark.sql.streaming.GroupState for more details.
- S
 The type of the user-defined state. Must be encodable to Spark SQL types.
- U
 The type of the output objects. Must be encodable to Spark SQL types.
- timeoutConf
 Timeout configuration for groups that do not receive data for a while. See Encoder for more details on what types are encodable to Spark SQL.
- func
 Function to be called on every group.
- Since
 2.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        mapGroupsWithState[S, U](func: (K, Iterator[V], GroupState[S]) ⇒ U)(implicit arg0: Encoder[S], arg1: Encoder[U]): Dataset[U]
      
      
      
(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.
(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See org.apache.spark.sql.streaming.GroupState for more details.
- S
 The type of the user-defined state. Must be encodable to Spark SQL types.
- U
 The type of the output objects. Must be encodable to Spark SQL types.
- func
 Function to be called on every group. See Encoder for more details on what types are encodable to Spark SQL.
- Since
 2.2.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        mapValues[W](func: MapFunction[V, W], encoder: Encoder[W]): KeyValueGroupedDataset[K, W]
      
      
      
Returns a new KeyValueGroupedDataset where the given function
funchas been applied to the data.Returns a new KeyValueGroupedDataset where the given function
funchas been applied to the data. The grouping key is unchanged by this.// Create Integer values grouped by String key from a Dataset<Tuple2<String, Integer>> Dataset<Tuple2<String, Integer>> ds = ...; KeyValueGroupedDataset<String, Integer> grouped = ds.groupByKey(t -> t._1, Encoders.STRING()).mapValues(t -> t._2, Encoders.INT());
- Since
 2.1.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        mapValues[W](func: (V) ⇒ W)(implicit arg0: Encoder[W]): KeyValueGroupedDataset[K, W]
      
      
      
Returns a new KeyValueGroupedDataset where the given function
funchas been applied to the data.Returns a new KeyValueGroupedDataset where the given function
funchas been applied to the data. The grouping key is unchanged by this.// Create values grouped by key from a Dataset[(K, V)] ds.groupByKey(_._1).mapValues(_._2) // Scala
- Since
 2.1.0
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        ne(arg0: AnyRef): Boolean
      
      
      
- Definition Classes
 - AnyRef
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        notify(): Unit
      
      
      
- Definition Classes
 - AnyRef
 - Annotations
 - @native()
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        notifyAll(): Unit
      
      
      
- Definition Classes
 - AnyRef
 - Annotations
 - @native()
 
 -  val queryExecution: QueryExecution
 - 
      
      
      
        
      
    
      
        
        def
      
      
        reduceGroups(f: ReduceFunction[V]): Dataset[(K, V)]
      
      
      
(Java-specific) Reduces the elements of each group of data using the specified binary function.
(Java-specific) Reduces the elements of each group of data using the specified binary function. The given function must be commutative and associative or the result may be non-deterministic.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        
        def
      
      
        reduceGroups(f: (V, V) ⇒ V): Dataset[(K, V)]
      
      
      
(Scala-specific) Reduces the elements of each group of data using the specified binary function.
(Scala-specific) Reduces the elements of each group of data using the specified binary function. The given function must be commutative and associative or the result may be non-deterministic.
- Since
 1.6.0
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        synchronized[T0](arg0: ⇒ T0): T0
      
      
      
- Definition Classes
 - AnyRef
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        toString(): String
      
      
      
- Definition Classes
 - KeyValueGroupedDataset → AnyRef → Any
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(): Unit
      
      
      
- Definition Classes
 - AnyRef
 - Annotations
 - @throws( ... )
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(arg0: Long, arg1: Int): Unit
      
      
      
- Definition Classes
 - AnyRef
 - Annotations
 - @throws( ... )
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(arg0: Long): Unit
      
      
      
- Definition Classes
 - AnyRef
 - Annotations
 - @throws( ... ) @native()