DStream.reduceByWindow(reduceFunc: Callable[[T, T], T], invReduceFunc: Optional[Callable[[T, T], T]], windowDuration: int, slideDuration: int) → pyspark.streaming.dstream.DStream[T][source]

Return a new DStream in which each RDD has a single element generated by reducing all elements in a sliding window over this DStream.

if invReduceFunc is not None, the reduction is done incrementally using the old window’s reduced value :

  1. reduce the new values that entered the window (e.g., adding new counts)

2. “inverse reduce” the old values that left the window (e.g., subtracting old counts) This is more efficient than invReduceFunc is None.


associative and commutative reduce function


inverse reduce function of reduceFunc; such that for all y, and invertible x: invReduceFunc(reduceFunc(x, y), x) = y


width of the window; must be a multiple of this DStream’s batching interval


sliding interval of the window (i.e., the interval after which the new DStream will generate RDDs); must be a multiple of this DStream’s batching interval