Package org.apache.spark.sql.expressions
Class Aggregator<IN,BUF,OUT>
Object
org.apache.spark.sql.expressions.Aggregator<IN,BUF,OUT>
- Type Parameters:
IN
- The input type for the aggregation.BUF
- The type of the intermediate value of the reduction.OUT
- The type of the final output result.
- All Implemented Interfaces:
Serializable
,scala.Serializable
- Direct Known Subclasses:
StringIndexerAggregator
A base class for user-defined aggregations, which can be used in
Dataset
operations to take
all of the elements of a group and reduce them to a single value.
For example, the following aggregator extracts an int
from a specific class and adds them up:
case class Data(i: Int)
val customSummer = new Aggregator[Data, Int, Int] {
def zero: Int = 0
def reduce(b: Int, a: Data): Int = b + a.i
def merge(b1: Int, b2: Int): Int = b1 + b2
def finish(r: Int): Int = r
def bufferEncoder: Encoder[Int] = Encoders.scalaInt
def outputEncoder: Encoder[Int] = Encoders.scalaInt
}.toColumn()
val ds: Dataset[Data] = ...
val aggregated = ds.select(customSummer)
Based loosely on Aggregator from Algebird: https://github.com/twitter/algebird
- Since:
- 1.6.0
- See Also:
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionSpecifies theEncoder
for the intermediate value type.abstract OUT
Transform the output of the reduction.abstract BUF
Merge two intermediate values.Specifies theEncoder
for the final output value type.abstract BUF
Combine two values to produce a new value.toColumn()
Returns thisAggregator
as aTypedColumn
that can be used inDataset
.abstract BUF
zero()
A zero value for this aggregation.
-
Constructor Details
-
Aggregator
public Aggregator()
-
-
Method Details
-
bufferEncoder
Specifies theEncoder
for the intermediate value type.- Returns:
- (undocumented)
- Since:
- 2.0.0
-
finish
Transform the output of the reduction.- Parameters:
reduction
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
merge
Merge two intermediate values.- Parameters:
b1
- (undocumented)b2
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
outputEncoder
Specifies theEncoder
for the final output value type.- Returns:
- (undocumented)
- Since:
- 2.0.0
-
reduce
Combine two values to produce a new value. For performance, the function may modifyb
and return it instead of constructing new object for b.- Parameters:
b
- (undocumented)a
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
toColumn
Returns thisAggregator
as aTypedColumn
that can be used inDataset
. operations.- Returns:
- (undocumented)
- Since:
- 1.6.0
-
zero
A zero value for this aggregation. Should satisfy the property that any b + zero = b.- Returns:
- (undocumented)
- Since:
- 1.6.0
-