pyspark.pandas.groupby.GroupBy.prod¶

GroupBy.prod(numeric_only: Optional[bool] = True, min_count: int = 0) → FrameLike[source]¶

Compute prod of groups.

New in version 3.4.0.

Parameters

numeric_onlybool, default False: Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data.
min_countint, default 0: The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

Returns

Series or DataFrame: Computed prod of values within each group.

See also

pyspark.pandas.Series.groupby
pyspark.pandas.DataFrame.groupby

Examples

>>> import numpy as np
>>> df = ps.DataFrame(
...     {
...         "A": [1, 1, 2, 1, 2],
...         "B": [np.nan, 2, 3, 4, 5],
...         "C": [1, 2, 1, 1, 2],
...         "D": [True, False, True, False, True],
...     }
... )

Groupby one column and return the prod of the remaining columns in each group.

>>> df.groupby('A').prod().sort_index()
     B  C  D
A
1  8.0  2  0
2  15.0 2  1

>>> df.groupby('A').prod(min_count=3).sort_index()
     B  C   D
A
1  NaN  2.0  0.0
2  NaN NaN  NaN

pyspark.pandas.groupby.GroupBy.nth

pyspark.pandas.groupby.GroupBy.rank