pyspark.sql.Catalog.createTable

Catalog.createTable(tableName: str, path: Optional[str] = None, source: Optional[str] = None, schema: Optional[pyspark.sql.types.StructType] = None, description: Optional[str] = None, **options: str) → pyspark.sql.dataframe.DataFrame[source]

Creates a table based on the dataset in a data source.

New in version 2.2.0.

Parameters
tableNamestr

name of the table to create.

Changed in version 3.4.0: Allow tableName to be qualified with catalog name.

pathstr, optional

the path in which the data for this table exists. When path is specified, an external table is created from the data at the given path. Otherwise a managed table is created.

sourcestr, optional

the source of this table such as ‘parquet, ‘orc’, etc. If source is not specified, the default data source configured by spark.sql.sources.default will be used.

schemaclass:StructType, optional

the schema for this table.

descriptionstr, optional

the description of this table.

Changed in version 3.1.0: Added the description parameter.

**optionsdict, optional

extra options to specify in the table.

Returns
DataFrame

The DataFrame associated with the table.

Examples

Creating a managed table.

>>> _ = spark.catalog.createTable("tbl1", schema=spark.range(1).schema, source='parquet')
>>> _ = spark.sql("DROP TABLE tbl1")

Creating an external table

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as d:
...     _ = spark.catalog.createTable(
...         "tbl2", schema=spark.range(1).schema, path=d, source='parquet')
>>> _ = spark.sql("DROP TABLE tbl2")