pyspark.sql.streaming.DataStreamReader.text

DataStreamReader.text(path: str, wholetext: bool = False, lineSep: Optional[str] = None, pathGlobFilter: Union[bool, str, None] = None, recursiveFileLookup: Union[bool, str, None] = None) → DataFrame[source]

Loads a text file stream and returns a DataFrame whose schema starts with a string column named “value”, and followed by partitioned columns if there are any. The text files must be encoded as UTF-8.

By default, each line in the text file is a new row in the resulting DataFrame.

New in version 2.0.0.

Changed in version 3.5.0: Supports Spark Connect.

Parameters
pathstr or list

string, or list of strings, for input path(s).

Other Parameters
Extra options

For the extra options, refer to Data Source Option in the version you use.

Notes

This API is evolving.

Examples

Load a data stream from a temporary text file.

>>> import tempfile
>>> import time
>>> with tempfile.TemporaryDirectory() as d:
...     # Write a temporary text file to read it.
...     spark.createDataFrame(
...         [("hello",), ("this",)]).write.mode("overwrite").format("text").save(d)
...
...     # Start a streaming query to read the text file.
...     q = spark.readStream.text(d).writeStream.format("console").start()
...     time.sleep(3)
...     q.stop()