pyspark.sql.streaming.DataStreamWriter.outputMode

DataStreamWriter.outputMode(outputMode: str) → pyspark.sql.streaming.readwriter.DataStreamWriter[source]

Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink.

New in version 2.0.0.

Changed in version 3.5.0: Supports Spark Connect.

Options include:

  • append: Only the new rows in the streaming DataFrame/Dataset will be written to

    the sink

  • complete: All the rows in the streaming DataFrame/Dataset will be written to the sink

    every time these are some updates

  • update: only the rows that were updated in the streaming DataFrame/Dataset will be

    written to the sink every time there are some updates. If the query doesn’t contain aggregations, it will be equivalent to append mode.

Notes

This API is evolving.

Examples

>>> df = spark.readStream.format("rate").load()
>>> df.writeStream.outputMode('append')
<...streaming.readwriter.DataStreamWriter object ...>

The example below uses Complete mode that the entire aggregated counts are printed out.

>>> import time
>>> df = spark.readStream.format("rate").option("rowsPerSecond", 10).load()
>>> df = df.groupby().count()
>>> q = df.writeStream.outputMode("complete").format("console").start()
>>> time.sleep(3)
>>> q.stop()