public class GroupedData
extends Object
DataFrame
, created by DataFrame.groupBy
.Modifier and Type | Method and Description |
---|---|
DataFrame |
agg(Column expr,
Column... exprs)
Compute aggregates by specifying a series of aggregate columns.
|
DataFrame |
agg(Column expr,
scala.collection.Seq<Column> exprs)
Compute aggregates by specifying a series of aggregate columns.
|
DataFrame |
agg(scala.collection.immutable.Map<String,String> exprs)
(Scala-specific) Compute aggregates by specifying a map from column name to
aggregate methods.
|
DataFrame |
agg(java.util.Map<String,String> exprs)
(Java-specific) Compute aggregates by specifying a map from column name to
aggregate methods.
|
DataFrame |
agg(scala.Tuple2<String,String> aggExpr,
scala.collection.Seq<scala.Tuple2<String,String>> aggExprs)
(Scala-specific) Compute aggregates by specifying a map from column name to
aggregate methods.
|
DataFrame |
avg(scala.collection.Seq<String> colNames)
Compute the mean value for each numeric columns for each group.
|
DataFrame |
avg(String... colNames)
Compute the mean value for each numeric columns for each group.
|
DataFrame |
count()
Count the number of rows for each group.
|
DataFrame |
max(scala.collection.Seq<String> colNames)
Compute the max value for each numeric columns for each group.
|
DataFrame |
max(String... colNames)
Compute the max value for each numeric columns for each group.
|
DataFrame |
mean(scala.collection.Seq<String> colNames)
Compute the average value for each numeric columns for each group.
|
DataFrame |
mean(String... colNames)
Compute the average value for each numeric columns for each group.
|
DataFrame |
min(scala.collection.Seq<String> colNames)
Compute the min value for each numeric column for each group.
|
DataFrame |
min(String... colNames)
Compute the min value for each numeric column for each group.
|
DataFrame |
sum(scala.collection.Seq<String> colNames)
Compute the sum for each numeric columns for each group.
|
DataFrame |
sum(String... colNames)
Compute the sum for each numeric columns for each group.
|
public DataFrame agg(Column expr, Column... exprs)
DataFrame
won't automatically include the grouping columns.
The available aggregate methods are defined in functions
.
// Selects the age of the oldest employee and the aggregate expense for each department
// Scala:
import org.apache.spark.sql.functions._
df.groupBy("department").agg($"department", max($"age"), sum($"expense"))
// Java:
import static org.apache.spark.sql.functions.*;
df.groupBy("department").agg(col("department"), max(col("age")), sum(col("expense")));
public DataFrame mean(String... colNames)
avg
.
The resulting DataFrame
will also contain the grouping columns.
When specified columns are given, only compute the average values for them.public DataFrame max(String... colNames)
DataFrame
will also contain the grouping columns.
When specified columns are given, only compute the max values for them.public DataFrame avg(String... colNames)
DataFrame
will also contain the grouping columns.
When specified columns are given, only compute the mean values for them.public DataFrame min(String... colNames)
DataFrame
will also contain the grouping columns.
When specified columns are given, only compute the min values for them.public DataFrame sum(String... colNames)
DataFrame
will also contain the grouping columns.
When specified columns are given, only compute the sum for them.public DataFrame agg(scala.Tuple2<String,String> aggExpr, scala.collection.Seq<scala.Tuple2<String,String>> aggExprs)
DataFrame
will also contain the grouping columns.
The available aggregate methods are avg
, max
, min
, sum
, count
.
// Selects the age of the oldest employee and the aggregate expense for each department
df.groupBy("department").agg(
"age" -> "max",
"expense" -> "sum"
)
public DataFrame agg(scala.collection.immutable.Map<String,String> exprs)
DataFrame
will also contain the grouping columns.
The available aggregate methods are avg
, max
, min
, sum
, count
.
// Selects the age of the oldest employee and the aggregate expense for each department
df.groupBy("department").agg(Map(
"age" -> "max",
"expense" -> "sum"
))
public DataFrame agg(java.util.Map<String,String> exprs)
DataFrame
will also contain the grouping columns.
The available aggregate methods are avg
, max
, min
, sum
, count
.
// Selects the age of the oldest employee and the aggregate expense for each department
import com.google.common.collect.ImmutableMap;
df.groupBy("department").agg(ImmutableMap.of("age", "max", "expense", "sum"));
public DataFrame agg(Column expr, scala.collection.Seq<Column> exprs)
DataFrame
won't automatically include the grouping columns.
The available aggregate methods are defined in functions
.
// Selects the age of the oldest employee and the aggregate expense for each department
// Scala:
import org.apache.spark.sql.functions._
df.groupBy("department").agg($"department", max($"age"), sum($"expense"))
// Java:
import static org.apache.spark.sql.functions.*;
df.groupBy("department").agg(col("department"), max(col("age")), sum(col("expense")));
public DataFrame count()
DataFrame
will also contain the grouping columns.public DataFrame mean(scala.collection.Seq<String> colNames)
avg
.
The resulting DataFrame
will also contain the grouping columns.
When specified columns are given, only compute the average values for them.public DataFrame max(scala.collection.Seq<String> colNames)
DataFrame
will also contain the grouping columns.
When specified columns are given, only compute the max values for them.public DataFrame avg(scala.collection.Seq<String> colNames)
DataFrame
will also contain the grouping columns.
When specified columns are given, only compute the mean values for them.public DataFrame min(scala.collection.Seq<String> colNames)
DataFrame
will also contain the grouping columns.
When specified columns are given, only compute the min values for them.