pyspark.sql.avro.functions.to_avro(data: ColumnOrName, jsonFormatSchema: str = '') → pyspark.sql.column.Column[source]

Converts a column into binary of avro format.

New in version 3.0.0.

Changed in version 3.5.0: Supports Spark Connect.

dataColumn or str

the data column.

jsonFormatSchemastr, optional

user-specified output avro schema in JSON string format.


Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of “Apache Avro Data Source Guide”.


>>> from pyspark.sql import Row
>>> from pyspark.sql.avro.functions import to_avro
>>> data = ['SPADES']
>>> df = spark.createDataFrame(data, "string")
>>> jsonFormatSchema = '''["null", {"type": "enum", "name": "value",
...     "symbols": ["SPADES", "HEARTS", "DIAMONDS", "CLUBS"]}]'''
>>>, jsonFormatSchema).alias("suite")).collect()