Input/Output

DataFrameReader.csv(path[, schema, sep, …])

Loads a CSV file and returns the result as a DataFrame.

DataFrameReader.format(source)

Specifies the input data source format.

DataFrameReader.jdbc(url, table[, column, …])

Construct a DataFrame representing the database table named table accessible via JDBC URL url and connection properties.

DataFrameReader.json(path[, schema, …])

Loads JSON files and returns the results as a DataFrame.

DataFrameReader.load([path, format, schema])

Loads data from a data source and returns it as a DataFrame.

DataFrameReader.option(key, value)

Adds an input option for the underlying data source.

DataFrameReader.options(**options)

Adds input options for the underlying data source.

DataFrameReader.orc(path[, mergeSchema, …])

Loads ORC files, returning the result as a DataFrame.

DataFrameReader.parquet(*paths, **options)

Loads Parquet files, returning the result as a DataFrame.

DataFrameReader.schema(schema)

Specifies the input schema.

DataFrameReader.table(tableName)

Returns the specified table as a DataFrame.

DataFrameReader.text(paths[, wholetext, …])

Loads text files and returns a DataFrame whose schema starts with a string column named “value”, and followed by partitioned columns if there are any.

DataFrameWriter.bucketBy(numBuckets, col, *cols)

Buckets the output by the given columns.

DataFrameWriter.csv(path[, mode, …])

Saves the content of the DataFrame in CSV format at the specified path.

DataFrameWriter.format(source)

Specifies the underlying output data source.

DataFrameWriter.insertInto(tableName[, …])

Inserts the content of the DataFrame to the specified table.

DataFrameWriter.jdbc(url, table[, mode, …])

Saves the content of the DataFrame to an external database table via JDBC.

DataFrameWriter.json(path[, mode, …])

Saves the content of the DataFrame in JSON format (JSON Lines text format or newline-delimited JSON) at the specified path.

DataFrameWriter.mode(saveMode)

Specifies the behavior when data or table already exists.

DataFrameWriter.option(key, value)

Adds an output option for the underlying data source.

DataFrameWriter.options(**options)

Adds output options for the underlying data source.

DataFrameWriter.orc(path[, mode, …])

Saves the content of the DataFrame in ORC format at the specified path.

DataFrameWriter.parquet(path[, mode, …])

Saves the content of the DataFrame in Parquet format at the specified path.

DataFrameWriter.partitionBy(*cols)

Partitions the output by the given columns on the file system.

DataFrameWriter.save([path, format, mode, …])

Saves the contents of the DataFrame to a data source.

DataFrameWriter.saveAsTable(name[, format, …])

Saves the content of the DataFrame as the specified table.

DataFrameWriter.sortBy(col, *cols)

Sorts the output in each bucket by the given columns on the file system.

DataFrameWriter.text(path[, compression, …])

Saves the content of the DataFrame in a text file at the specified path.

DataFrameWriterV2.using(provider)

Specifies a provider for the underlying output data source.

DataFrameWriterV2.option(key, value)

Add a write option.

DataFrameWriterV2.options(**options)

Add write options.

DataFrameWriterV2.tableProperty(property, value)

Add table property.

DataFrameWriterV2.partitionedBy(col, *cols)

Partition the output table created by create, createOrReplace, or replace using the given columns or transforms.

DataFrameWriterV2.create()

Create a new table from the contents of the data frame.

DataFrameWriterV2.replace()

Replace an existing table with the contents of the data frame.

DataFrameWriterV2.createOrReplace()

Create a new table or replace an existing table with the contents of the data frame.

DataFrameWriterV2.append()

Append the contents of the data frame to the output table.

DataFrameWriterV2.overwrite(condition)

Overwrite rows matching the given filter condition with the contents of the data frame in the output table.

DataFrameWriterV2.overwritePartitions()

Overwrite all partition for which the data frame contains at least one row with the contents of the data frame in the output table.