pyspark.sql.DataFrameWriter#
- class pyspark.sql.DataFrameWriter(df)[source]#
Interface used to write a
DataFrameto external storage systems (e.g. file systems, key-value stores, etc). UseDataFrame.writeto access this.New in version 1.4.0.
Changed in version 3.4.0: Supports Spark Connect.
Methods
bucketBy(numBuckets, col, *cols)Buckets the output by the given columns.
clusterBy(*cols)Clusters the data by the given columns to optimize query performance.
csv(path[, mode, compression, sep, quote, ...])Saves the content of the
DataFramein CSV format at the specified path.format(source)Specifies the underlying output data source.
insertInto(tableName[, overwrite])Inserts the content of the
DataFrameto the specified table.jdbc(url, table[, mode, properties])Saves the content of the
DataFrameto an external database table via JDBC.json(path[, mode, compression, dateFormat, ...])Saves the content of the
DataFramein JSON format (JSON Lines text format or newline-delimited JSON) at the specified path.mode(saveMode)Specifies the behavior when data or table already exists.
option(key, value)Adds an output option for the underlying data source.
options(**options)Adds output options for the underlying data source.
orc(path[, mode, partitionBy, compression])Saves the content of the
DataFramein ORC format at the specified path.parquet(path[, mode, partitionBy, compression])Saves the content of the
DataFramein Parquet format at the specified path.partitionBy(*cols)Partitions the output by the given columns on the file system.
save([path, format, mode, partitionBy])Saves the contents of the
DataFrameto a data source.saveAsTable(name[, format, mode, partitionBy])Saves the content of the
DataFrameas the specified table.sortBy(col, *cols)Sorts the output in each bucket by the given columns on the file system.
text(path[, compression, lineSep])Saves the content of the DataFrame in a text file at the specified path.
xml(path[, rowTag, mode, attributePrefix, ...])Saves the content of the
DataFramein XML format at the specified path.