pyspark.sql.DataFrameReader.text

DataFrameReader.text(paths: Union[str, List[str]], wholetext: bool = False, lineSep: Optional[str] = None, pathGlobFilter: Union[bool, str, None] = None, recursiveFileLookup: Union[bool, str, None] = None, modifiedBefore: Union[bool, str, None] = None, modifiedAfter: Union[bool, str, None] = None) → DataFrame[source]

Loads text files and returns a DataFrame whose schema starts with a string column named “value”, and followed by partitioned columns if there are any. The text files must be encoded as UTF-8.

By default, each line in the text file is a new row in the resulting DataFrame.

New in version 1.6.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
pathsstr or list

string, or list of strings, for input path(s).

Other Parameters
Extra options

For the extra options, refer to Data Source Option for the version you use.

Examples

Write a DataFrame into a text file and read it back.

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as d:
...     # Write a DataFrame into a text file
...     df = spark.createDataFrame([("a",), ("b",), ("c",)], schema=["alphabets"])
...     df.write.mode("overwrite").format("text").save(d)
...
...     # Read the text file as a DataFrame.
...     spark.read.schema(df.schema).text(d).sort("alphabets").show()
+---------+
|alphabets|
+---------+
|        a|
|        b|
|        c|
+---------+