pyspark.testing.assertSchemaEqual(actual: pyspark.sql.types.StructType, expected: pyspark.sql.types.StructType)[source]

A util function to assert equality between DataFrame schemas actual and expected.

New in version 3.5.0.


The DataFrame schema that is being compared or tested.


The expected schema, for comparison with the actual schema.


When assertSchemaEqual fails, the error message uses the Python difflib library to display a diff log of the actual and expected schemas.


>>> from pyspark.sql.types import StructType, StructField, ArrayType, IntegerType, DoubleType
>>> s1 = StructType([StructField("names", ArrayType(DoubleType(), True), True)])
>>> s2 = StructType([StructField("names", ArrayType(DoubleType(), True), True)])
>>> assertSchemaEqual(s1, s2)  # pass, schemas are identical
>>> df1 = spark.createDataFrame(data=[(1, 1000), (2, 3000)], schema=["id", "number"])
>>> df2 = spark.createDataFrame(data=[("1", 1000), ("2", 5000)], schema=["id", "amount"])
>>> assertSchemaEqual(df1.schema, df2.schema)  
Traceback (most recent call last):
PySparkAssertionError: [DIFFERENT_SCHEMA] Schemas do not match.
--- actual
+++ expected
- StructType([StructField('id', LongType(), True), StructField('number', LongType(), True)])
?                               ^^                               ^^^^^
+ StructType([StructField('id', StringType(), True), StructField('amount', LongType(), True)])
?                               ^^^^                              ++++ ^