StructType

class pyspark.sql.types.StructType(fields: Optional[List[pyspark.sql.types.StructField]] = None)[source]

Struct type, consisting of a list of StructField.

This is the data type representing a Row.

Iterating a StructType will iterate over its StructFields. A contained StructField can be accessed by its name or position.

Examples

>>> from pyspark.sql.types import *
>>> struct1 = StructType([StructField("f1", StringType(), True)])
>>> struct1["f1"]
StructField('f1', StringType(), True)
>>> struct1[0]
StructField('f1', StringType(), True)
>>> struct1 = StructType([StructField("f1", StringType(), True)])
>>> struct2 = StructType([StructField("f1", StringType(), True)])
>>> struct1 == struct2
True
>>> struct1 = StructType([StructField("f1", CharType(10), True)])
>>> struct2 = StructType([StructField("f1", CharType(10), True)])
>>> struct1 == struct2
True
>>> struct1 = StructType([StructField("f1", VarcharType(10), True)])
>>> struct2 = StructType([StructField("f1", VarcharType(10), True)])
>>> struct1 == struct2
True
>>> struct1 = StructType([StructField("f1", StringType(), True)])
>>> struct2 = StructType([StructField("f1", StringType(), True),
...     StructField("f2", IntegerType(), False)])
>>> struct1 == struct2
False

The below example demonstrates how to create a DataFrame based on a struct created using class:StructType and class:StructField:

>>> data = [("Alice", ["Java", "Scala"]), ("Bob", ["Python", "Scala"])]
>>> schema = StructType([
...     StructField("name", StringType()),
...     StructField("languagesSkills", ArrayType(StringType())),
... ])
>>> df = spark.createDataFrame(data=data, schema=schema)
>>> df.printSchema()
root
 |-- name: string (nullable = true)
 |-- languagesSkills: array (nullable = true)
 |    |-- element: string (containsNull = true)
>>> df.show()
+-----+---------------+
| name|languagesSkills|
+-----+---------------+
|Alice|  [Java, Scala]|
|  Bob|[Python, Scala]|
+-----+---------------+

Methods

add(field[, data_type, nullable, metadata])

Construct a StructType by adding new elements to it, to define the schema.

fieldNames()

Returns all field names in a list.

fromInternal(obj)

Converts an internal SQL object into a native Python object.

fromJson(json)

Constructs StructType from a schema defined in JSON format.

json()

jsonValue()

needConversion()

Does this type needs conversion between Python object and internal SQL object.

simpleString()

toInternal(obj)

Converts a Python object into an internal SQL object.

typeName()

Methods Documentation

add(field: Union[str, pyspark.sql.types.StructField], data_type: Union[str, pyspark.sql.types.DataType, None] = None, nullable: bool = True, metadata: Optional[Dict[str, Any]] = None)pyspark.sql.types.StructType[source]

Construct a StructType by adding new elements to it, to define the schema. The method accepts either:

  1. A single parameter which is a StructField object.

  2. Between 2 and 4 parameters as (name, data_type, nullable (optional), metadata(optional). The data_type parameter may be either a String or a DataType object.

Parameters
fieldstr or StructField

Either the name of the field or a StructField object

data_typeDataType, optional

If present, the DataType of the StructField to create

nullablebool, optional

Whether the field to add should be nullable (default True)

metadatadict, optional

Any additional metadata (default None)

Returns
StructType

Examples

>>> from pyspark.sql.types import IntegerType, StringType, StructField, StructType
>>> struct1 = StructType().add("f1", StringType(), True).add("f2", StringType(), True, None)
>>> struct2 = StructType([StructField("f1", StringType(), True),
...     StructField("f2", StringType(), True, None)])
>>> struct1 == struct2
True
>>> struct1 = StructType().add(StructField("f1", StringType(), True))
>>> struct2 = StructType([StructField("f1", StringType(), True)])
>>> struct1 == struct2
True
>>> struct1 = StructType().add("f1", "string", True)
>>> struct2 = StructType([StructField("f1", StringType(), True)])
>>> struct1 == struct2
True
fieldNames() → List[str][source]

Returns all field names in a list.

Examples

>>> from pyspark.sql.types import StringType, StructField, StructType
>>> struct = StructType([StructField("f1", StringType(), True)])
>>> struct.fieldNames()
['f1']
fromInternal(obj: Tuple) → pyspark.sql.types.Row[source]

Converts an internal SQL object into a native Python object.

classmethod fromJson(json: Dict[str, Any])pyspark.sql.types.StructType[source]

Constructs StructType from a schema defined in JSON format.

Below is a JSON schema it must adhere to:

 {
   "title":"StructType",
   "description":"Schema of StructType in json format",
   "type":"object",
   "properties":{
      "fields":{
         "description":"Array of struct fields",
         "type":"array",
         "items":{
             "type":"object",
             "properties":{
                "name":{
                   "description":"Name of the field",
                   "type":"string"
                },
                "type":{
                   "description": "Type of the field. Can either be
                                   another nested StructType or primitive type",
                   "type":"object/string"
                },
                "nullable":{
                   "description":"If nulls are allowed",
                   "type":"boolean"
                },
                "metadata":{
                   "description":"Additional metadata to supply",
                   "type":"object"
                },
                "required":[
                   "name",
                   "type",
                   "nullable",
                   "metadata"
                ]
             }
        }
     }
  }
}
Parameters
jsondict or a dict-like object e.g. JSON object

This “dict” must have “fields” key that returns an array of fields each of which must have specific keys (name, type, nullable, metadata).

Returns
StructType

Examples

>>> json_str = '''
...  {
...      "fields": [
...          {
...              "metadata": {},
...              "name": "Person",
...              "nullable": true,
...              "type": {
...                  "fields": [
...                      {
...                          "metadata": {},
...                          "name": "name",
...                          "nullable": false,
...                          "type": "string"
...                      },
...                      {
...                          "metadata": {},
...                          "name": "surname",
...                          "nullable": false,
...                          "type": "string"
...                      }
...                  ],
...                  "type": "struct"
...              }
...          }
...      ],
...      "type": "struct"
...  }
...  '''
>>> import json
>>> scheme = StructType.fromJson(json.loads(json_str))
>>> scheme.simpleString()
'struct<Person:struct<name:string,surname:string>>'
json() → str
jsonValue() → Dict[str, Any][source]
needConversion() → bool[source]

Does this type needs conversion between Python object and internal SQL object.

This is used to avoid the unnecessary conversion for ArrayType/MapType/StructType.

simpleString() → str[source]
toInternal(obj: Tuple) → Tuple[source]

Converts a Python object into an internal SQL object.

classmethod typeName() → str