pyspark.sql.DataFrameReader.load#

DataFrameReader.load(path=None, format=None, schema=None, **options)[source]#

Loads data from a data source and returns it as a DataFrame.

New in version 1.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters

pathstr or list, optional: optional string or a list of string for file-system backed data sources.
formatstr, optional: optional string for format of the data source. Default to ‘parquet’.
schemapyspark.sql.types.StructType or str, optional: optional pyspark.sql.types.StructType for the input schema or a DDL-formatted string (For example col0 INT, col1 DOUBLE).
**optionsdict: all other string options

Examples

Load a CSV file with format, schema and options specified.

>>> import tempfile
>>> with tempfile.TemporaryDirectory(prefix="load") as d:
...     # Write a DataFrame into a CSV file with a header
...     df = spark.createDataFrame([{"age": 100, "name": "Hyukjin Kwon"}])
...     df.write.option("header", True).mode("overwrite").format("csv").save(d)
...
...     # Read the CSV file as a DataFrame with 'nullValue' option set to 'Hyukjin Kwon',
...     # and 'header' option set to `True`.
...     df = spark.read.load(
...         d, schema=df.schema, format="csv", nullValue="Hyukjin Kwon", header=True)
...     df.printSchema()
...     df.show()
root
 |-- age: long (nullable = true)
 |-- name: string (nullable = true)
+---+----+
|age|name|
+---+----+
|100|NULL|
+---+----+