pyspark.sql.Catalog.createTable#

Catalog.createTable(tableName, path=None, source=None, schema=None, description=None, **options)[source]#

Creates a table based on the dataset in a data source.

New in version 2.2.0.

Parameters

tableNamestr: name of the table to create.

Changed in version 3.4.0: Allow tableName to be qualified with catalog name.
pathstr, optional: the path in which the data for this table exists. When path is specified, an external table is created from the data at the given path. Otherwise a managed table is created.
sourcestr, optional: the source of this table such as ‘parquet, ‘orc’, etc. If source is not specified, the default data source configured by spark.sql.sources.default will be used.
schemaclass:StructType, optional: the schema for this table.
descriptionstr, optional: the description of this table.

Changed in version 3.1.0: Added the description parameter.
**optionsdict, optional: extra options to specify in the table.

Returns

DataFrame: The DataFrame associated with the table.

Examples

Creating a managed table.

>>> _ = spark.catalog.createTable("tbl1", schema=spark.range(1).schema, source='parquet')
>>> _ = spark.sql("DROP TABLE tbl1")

Creating an external table

>>> import tempfile
>>> with tempfile.TemporaryDirectory(prefix="createTable") as d:
...     _ = spark.catalog.createTable(
...         "tbl2", schema=spark.range(1).schema, path=d, source='parquet')
>>> _ = spark.sql("DROP TABLE tbl2")