pyspark.sql.SparkSession.readStream

property SparkSession.readStream

Returns a DataStreamReader that can be used to read data streams as a streaming DataFrame.

New in version 2.0.0.

Changed in version 3.5.0: Supports Spark Connect.

Returns
DataStreamReader

Notes

This API is evolving.

Examples

>>> spark.readStream
<pyspark...DataStreamReader object ...>

The example below uses Rate source that generates rows continuously. After that, we operate a modulo by 3, and then write the stream out to the console. The streaming query stops in 3 seconds.

>>> import time
>>> df = spark.readStream.format("rate").load()
>>> df = df.selectExpr("value % 3 as v")
>>> q = df.writeStream.format("console").start()
>>> time.sleep(3)
>>> q.stop()