@Evolving public interface StagingTableCatalog extends TableCatalog
TableCatalog
that support staging creation of
the a table before committing the table's metadata along with its contents in CREATE TABLE AS
SELECT or REPLACE TABLE AS SELECT operations.
It is highly recommended to implement this trait whenever possible so that CREATE TABLE AS
SELECT and REPLACE TABLE AS SELECT operations are atomic. For example, when one runs a REPLACE
TABLE AS SELECT operation, if the catalog does not implement this trait, the planner will first
drop the table via TableCatalog.dropTable(Identifier)
, then create the table via
TableCatalog.createTable(Identifier, StructType, Transform[], Map)
, and then perform
the write via SupportsWrite.newWriteBuilder(LogicalWriteInfo)
.
However, if the write operation fails, the catalog will have already dropped the table, and the
planner cannot roll back the dropping of the table.
If the catalog implements this plugin, the catalog can implement the methods to "stage" the
creation and the replacement of a table. After the table's
BatchWrite.commit(WriterCommitMessage[])
is called,
StagedTable.commitStagedChanges()
is called, at which point the staged table can
complete both the data write and the metadata swap operation atomically.
OPTION_PREFIX, PROP_COMMENT, PROP_EXTERNAL, PROP_IS_MANAGED_LOCATION, PROP_LOCATION, PROP_OWNER, PROP_PROVIDER
Modifier and Type | Method and Description |
---|---|
default StagedTable |
stageCreate(Identifier ident,
Column[] columns,
Transform[] partitions,
java.util.Map<String,String> properties)
Stage the creation of a table, preparing it to be committed into the metastore.
|
StagedTable |
stageCreate(Identifier ident,
StructType schema,
Transform[] partitions,
java.util.Map<String,String> properties)
Deprecated.
|
default StagedTable |
stageCreateOrReplace(Identifier ident,
Column[] columns,
Transform[] partitions,
java.util.Map<String,String> properties)
Stage the creation or replacement of a table, preparing it to be committed into the metastore
when the returned table's
StagedTable.commitStagedChanges() is called. |
StagedTable |
stageCreateOrReplace(Identifier ident,
StructType schema,
Transform[] partitions,
java.util.Map<String,String> properties)
Stage the creation or replacement of a table, preparing it to be committed into the metastore
when the returned table's
StagedTable.commitStagedChanges() is called. |
default StagedTable |
stageReplace(Identifier ident,
Column[] columns,
Transform[] partitions,
java.util.Map<String,String> properties)
Stage the replacement of a table, preparing it to be committed into the metastore when the
returned table's
StagedTable.commitStagedChanges() is called. |
StagedTable |
stageReplace(Identifier ident,
StructType schema,
Transform[] partitions,
java.util.Map<String,String> properties)
Stage the replacement of a table, preparing it to be committed into the metastore when the
returned table's
StagedTable.commitStagedChanges() is called. |
alterTable, capabilities, createTable, createTable, dropTable, invalidateTable, listTables, loadTable, loadTable, loadTable, purgeTable, renameTable, tableExists
defaultNamespace, initialize, name
@Deprecated StagedTable stageCreate(Identifier ident, StructType schema, Transform[] partitions, java.util.Map<String,String> properties) throws org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException, org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
This is deprecated. Please override
stageCreate(Identifier, Column[], Transform[], Map)
instead.
org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException
org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
default StagedTable stageCreate(Identifier ident, Column[] columns, Transform[] partitions, java.util.Map<String,String> properties) throws org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException, org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
When the table is committed, the contents of any writes performed by the Spark planner are
committed along with the metadata about the table passed into this method's arguments. If the
table exists when this method is called, the method should throw an exception accordingly. If
another process concurrently creates the table before this table's staged changes are
committed, an exception should be thrown by StagedTable.commitStagedChanges()
.
ident
- a table identifiercolumns
- the column of the new tablepartitions
- transforms to use for partitioning data in the tableproperties
- a string map of table propertiesorg.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException
- If a table or view already exists for the identifierUnsupportedOperationException
- If a requested partition transform is not supportedorg.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
- If the identifier namespace does not exist (optional)StagedTable stageReplace(Identifier ident, StructType schema, Transform[] partitions, java.util.Map<String,String> properties) throws org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException, org.apache.spark.sql.catalyst.analysis.NoSuchTableException
StagedTable.commitStagedChanges()
is called.
This is deprecated, please override
stageReplace(Identifier, StructType, Transform[], Map)
instead.
org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
org.apache.spark.sql.catalyst.analysis.NoSuchTableException
default StagedTable stageReplace(Identifier ident, Column[] columns, Transform[] partitions, java.util.Map<String,String> properties) throws org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException, org.apache.spark.sql.catalyst.analysis.NoSuchTableException
StagedTable.commitStagedChanges()
is called.
When the table is committed, the contents of any writes performed by the Spark planner are committed along with the metadata about the table passed into this method's arguments. If the table exists, the metadata and the contents of this table replace the metadata and contents of the existing table. If a concurrent process commits changes to the table's data or metadata while the write is being performed but before the staged changes are committed, the catalog can decide whether to move forward with the table replacement anyways or abort the commit operation.
If the table does not exist, committing the staged changes should fail with
NoSuchTableException
. This differs from the semantics of
stageCreateOrReplace(Identifier, StructType, Transform[], Map)
, which should create
the table in the data source if the table does not exist at the time of committing the
operation.
ident
- a table identifiercolumns
- the columns of the new tablepartitions
- transforms to use for partitioning data in the tableproperties
- a string map of table propertiesUnsupportedOperationException
- If a requested partition transform is not supportedorg.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
- If the identifier namespace does not exist (optional)org.apache.spark.sql.catalyst.analysis.NoSuchTableException
- If the table does not existStagedTable stageCreateOrReplace(Identifier ident, StructType schema, Transform[] partitions, java.util.Map<String,String> properties) throws org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
StagedTable.commitStagedChanges()
is called.
This is deprecated, please override
stageCreateOrReplace(Identifier, Column[], Transform[], Map)
instead.
org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
default StagedTable stageCreateOrReplace(Identifier ident, Column[] columns, Transform[] partitions, java.util.Map<String,String> properties) throws org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
StagedTable.commitStagedChanges()
is called.
When the table is committed, the contents of any writes performed by the Spark planner are committed along with the metadata about the table passed into this method's arguments. If the table exists, the metadata and the contents of this table replace the metadata and contents of the existing table. If a concurrent process commits changes to the table's data or metadata while the write is being performed but before the staged changes are committed, the catalog can decide whether to move forward with the table replacement anyways or abort the commit operation.
If the table does not exist when the changes are committed, the table should be created in the
backing data source. This differs from the expected semantics of
stageReplace(Identifier, StructType, Transform[], Map)
, which should fail when
the staged changes are committed but the table doesn't exist at commit time.
ident
- a table identifiercolumns
- the columns of the new tablepartitions
- transforms to use for partitioning data in the tableproperties
- a string map of table propertiesUnsupportedOperationException
- If a requested partition transform is not supportedorg.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
- If the identifier namespace does not exist (optional)