DataFrame.to_delta(path: str, mode: str = 'w', partition_cols: Union[str, List[str], None] = None, index_col: Union[str, List[str], None] = None, **options: OptionalPrimitiveType) → None[source]

Write the DataFrame out as a Delta Lake table.

pathstr, required

Path to write to.


Python write mode, default ‘w’.


mode can accept the strings for Spark writing mode. Such as ‘append’, ‘overwrite’, ‘ignore’, ‘error’, ‘errorifexists’.

  • ‘append’ (equivalent to ‘a’): Append the new data to existing data.

  • ‘overwrite’ (equivalent to ‘w’): Overwrite existing data.

  • ‘ignore’: Silently ignore this operation if data already exists.

  • ‘error’ or ‘errorifexists’: Throw an exception if data already exists.

partition_colsstr or list of str, optional, default None

Names of partitioning columns

index_col: str or list of str, optional, default: None

Column names to be used in Spark to represent pandas-on-Spark’s index. The index name in pandas-on-Spark is ignored. By default the index is always lost.


All other options passed directly into Delta Lake.


>>> df = ps.DataFrame(dict(
...    date=list(pd.date_range('2012-1-1 12:00:00', periods=3, freq='M')),
...    country=['KR', 'US', 'JP'],
...    code=[1, 2 ,3]), columns=['date', 'country', 'code'])
>>> df
                 date country  code
0 2012-01-31 12:00:00      KR     1
1 2012-02-29 12:00:00      US     2
2 2012-03-31 12:00:00      JP     3

Create a new Delta Lake table, partitioned by one column:

>>> df.to_delta('%s/to_delta/foo' % path, partition_cols='date')  

Partitioned by two columns:

>>> df.to_delta('%s/to_delta/bar' % path,
...             partition_cols=['date', 'country'])  

Overwrite an existing table’s partitions, using the ‘replaceWhere’ capability in Delta:

>>> df.to_delta('%s/to_delta/bar' % path,
...             mode='overwrite', replaceWhere='date >= "2012-01-01"')