pyspark.sql.DataFrame.withMetadata

DataFrame.withMetadata(columnName: str, metadata: Dict[str, Any]) → pyspark.sql.dataframe.DataFrame[source]

Returns a new DataFrame by updating an existing column with metadata.

New in version 3.3.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
columnNamestr

string, name of the existing column to update the metadata.

metadatadict

dict, new metadata to be assigned to df.schema[columnName].metadata

Returns
DataFrame

DataFrame with updated metadata column.

Examples

>>> df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], schema=["age", "name"])
>>> df_meta = df.withMetadata('age', {'foo': 'bar'})
>>> df_meta.schema['age'].metadata
{'foo': 'bar'}