pyspark.sql.streaming.DataStreamReader.parquet

DataStreamReader.parquet(path: str, mergeSchema: Optional[bool] = None, pathGlobFilter: Union[bool, str, None] = None, recursiveFileLookup: Union[bool, str, None] = None, datetimeRebaseMode: Union[bool, str, None] = None, int96RebaseMode: Union[bool, str, None] = None) → DataFrame[source]

Loads a Parquet file stream, returning the result as a DataFrame.

New in version 2.0.0.

Changed in version 3.5.0: Supports Spark Connect.

Parameters
pathstr

the path in any Hadoop supported file system

Other Parameters
Extra options

For the extra options, refer to Data Source Option. in the version you use.

Examples

Load a data stream from a temporary Parquet file.

>>> import tempfile
>>> import time
>>> with tempfile.TemporaryDirectory() as d:
...     # Write a temporary Parquet file to read it.
...     spark.range(10).write.mode("overwrite").format("parquet").save(d)
...
...     # Start a streaming query to read the Parquet file.
...     q = spark.readStream.schema(
...         "id LONG").parquet(d).writeStream.format("console").start()
...     time.sleep(3)
...     q.stop()