pyspark.sql.functions.schema_of_csv¶

pyspark.sql.functions.schema_of_csv(csv: ColumnOrName, options: Optional[Dict[str, str]] = None) → pyspark.sql.column.Column[source]¶

Parses a CSV string and infers its schema in DDL format.

New in version 3.0.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters

csvColumn or str: a CSV string or a foldable string column containing a CSV string.
optionsdict, optional: options to control parsing. accepts the same options as the CSV datasource. See Data Source Option for the version you use.

Returns

Column: a string representation of a StructType parsed from given CSV.

Examples

>>> df = spark.range(1)
>>> df.select(schema_of_csv(lit('1|a'), {'sep':'|'}).alias("csv")).collect()
[Row(csv='STRUCT<_c0: INT, _c1: STRING>')]
>>> df.select(schema_of_csv('1|a', {'sep':'|'}).alias("csv")).collect()
[Row(csv='STRUCT<_c0: INT, _c1: STRING>')]

pyspark.sql.functions.from_csv

pyspark.sql.functions.str_to_map