pyspark.pandas.DataFrame.cumprod

DataFrame.cumprod(skipna: bool = True) → FrameLike

Return cumulative product over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative product.

Note

the current implementation of cumprod uses Spark’s Window without specifying partition specification. This leads to moveing all data into a single partition in a single machine and could cause serious performance degradation. Avoid this method with very large datasets.

Note

unlike pandas’, pandas-on-Spark’s emulates cumulative product by exp(sum(log(...))) trick. Therefore, it only works for positive numbers.

Parameters
skipna: boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

Returns
DataFrame or Series
Raises
Exception: If the values is equal to or lower than 0.

See also

DataFrame.cummax

Return cumulative maximum over DataFrame axis.

DataFrame.cummin

Return cumulative minimum over DataFrame axis.

DataFrame.cumsum

Return cumulative sum over DataFrame axis.

DataFrame.cumprod

Return cumulative product over DataFrame axis.

Series.cummax

Return cumulative maximum over Series axis.

Series.cummin

Return cumulative minimum over Series axis.

Series.cumsum

Return cumulative sum over Series axis.

Series.cumprod

Return cumulative product over Series axis.

Examples

>>> df = ps.DataFrame([[2.0, 1.0], [3.0, None], [4.0, 10.0]], columns=list('AB'))
>>> df
     A     B
0  2.0   1.0
1  3.0   NaN
2  4.0  10.0

By default, iterates over rows and finds the sum in each column.

>>> df.cumprod()
      A     B
0   2.0   1.0
1   6.0   NaN
2  24.0  10.0

It works identically in Series.

>>> df.A.cumprod()
0     2.0
1     6.0
2    24.0
Name: A, dtype: float64