pyspark.testing.assertSchemaEqual

pyspark.testing.assertSchemaEqual(actual: pyspark.sql.types.StructType, expected: pyspark.sql.types.StructType)[source]

A util function to assert equality between DataFrame schemas actual and expected.

New in version 3.5.0.

Parameters
actualStructType

The DataFrame schema that is being compared or tested.

expectedStructType

The expected schema, for comparison with the actual schema.

Notes

When assertSchemaEqual fails, the error message uses the Python difflib library to display a diff log of the actual and expected schemas.

Examples

>>> from pyspark.sql.types import StructType, StructField, ArrayType, IntegerType, DoubleType
>>> s1 = StructType([StructField("names", ArrayType(DoubleType(), True), True)])
>>> s2 = StructType([StructField("names", ArrayType(DoubleType(), True), True)])
>>> assertSchemaEqual(s1, s2)  # pass, schemas are identical
>>> df1 = spark.createDataFrame(data=[(1, 1000), (2, 3000)], schema=["id", "number"])
>>> df2 = spark.createDataFrame(data=[("1", 1000), ("2", 5000)], schema=["id", "amount"])
>>> assertSchemaEqual(df1.schema, df2.schema)  
Traceback (most recent call last):
...
PySparkAssertionError: [DIFFERENT_SCHEMA] Schemas do not match.
--- actual
+++ expected
- StructType([StructField('id', LongType(), True), StructField('number', LongType(), True)])
?                               ^^                               ^^^^^
+ StructType([StructField('id', StringType(), True), StructField('amount', LongType(), True)])
?                               ^^^^                              ++++ ^