package catalog
Type Members
- trait CatalogExtension extends TableCatalog with FunctionCatalog with SupportsNamespaces
An API to extend the Spark built-in session catalog.
An API to extend the Spark built-in session catalog. Implementation can get the built-in session catalog from
#setDelegateCatalog(CatalogPlugin), implement catalog functions with some custom logic and call the built-in session catalog at the end. For example, they can implementcreateTable, do something else before callingcreateTableof the built-in session catalog.- Annotations
- @Evolving()
- Since
3.0.0
- class CatalogNotFoundException extends SparkException
- Annotations
- @Experimental()
- trait CatalogPlugin extends AnyRef
A marker interface to provide a catalog implementation for Spark.
A marker interface to provide a catalog implementation for Spark.
Implementations can provide catalog functions by implementing additional interfaces for tables, views, and functions.
Catalog implementations must implement this marker interface to be loaded by
SQLConf). The loader will instantiate catalog classes using the required public no-arg constructor. After creating an instance, it will be configured by callingCaseInsensitiveStringMap).Catalog implementations are registered to a name by adding a configuration option to Spark:
spark.sql.catalog.catalog-name=com.example.YourCatalogClass. All configuration properties in the Spark configuration that share the catalog name prefix,spark.sql.catalog.catalog-name.(key)=(value)will be passed in the case insensitive string map of options in initialization with the prefix removed.name, is also passed and is the catalog's name; in this case, "catalog-name".- Annotations
- @Evolving()
- Since
3.0.0
- trait Column extends AnyRef
An interface representing a column of a
Table.An interface representing a column of a
Table. It defines basic properties of a column, such as name and data type, as well as some advanced ones like default column value.Data Sources do not need to implement it. They should consume it in APIs like
Column[], Transform[], Map), and report it inTable#columns()by calling the staticcreatefunctions of this interface to create it.A column cannot have both a default value and a generation expression.
- Annotations
- @Evolving()
- class ColumnDefaultValue extends AnyRef
A class representing the default value of a column.
A class representing the default value of a column. It contains both the SQL string and literal value of the user-specified default value expression. The SQL string should be re-evaluated for each table writing command, which may produce different values if the default value expression is something like
CURRENT_DATE(). The literal value is used to back-fill existing data if new columns with default value are added. Note: the back-fill can be lazy. The data sources can remember the column default value and let the reader fill the column value when reading existing data that do not have these new columns.- Annotations
- @Evolving()
- class DefaultValue extends AnyRef
A class that represents default values.
A class that represents default values.
Connectors can define default values using either a SQL string (Spark SQL dialect) or an
expressionif the default value can be expressed as a supported connector expression. If both the SQL string and the expression are provided, Spark first attempts to convert the given expression to its internal representation. If the expression cannot be converted, and a SQL string is provided, Spark will fall back to parsing the SQL string.- Annotations
- @Evolving()
- Since
4.0.0
- abstract class DelegatingCatalogExtension extends CatalogExtension
A simple implementation of
CatalogExtension, which implements all the catalog functions by calling the built-in session catalog directly.A simple implementation of
CatalogExtension, which implements all the catalog functions by calling the built-in session catalog directly. This is created for convenience, so that users only need to override some methods where they want to apply custom logic. For example, they can overridecreateTable, do something else before callingsuper.createTable.- Annotations
- @Evolving()
- Since
3.0.0
- trait FunctionCatalog extends CatalogPlugin
Catalog methods for working with Functions.
Catalog methods for working with Functions.
- Annotations
- @Evolving()
- Since
3.2.0
- trait Identifier extends AnyRef
Identifies an object in a catalog.
Identifies an object in a catalog.
- Annotations
- @Evolving()
- Since
3.0.0
- class IdentityColumnSpec extends AnyRef
Identity column specification.
Identity column specification.
- Annotations
- @Evolving()
- trait MetadataColumn extends AnyRef
Interface for a metadata column.
Interface for a metadata column.
A metadata column can expose additional metadata about a row. For example, rows from Kafka can use metadata columns to expose a message's topic, partition number, and offset.
A metadata column could also be the result of a transform applied to a value in the row. For example, a partition value produced by bucket(id, 16) could be exposed by a metadata column. In this case,
#transform()should return a non-nullTransformthat produced the metadata column's values.- Annotations
- @Evolving()
- Since
3.1.0
- trait NamespaceChange extends AnyRef
NamespaceChange subclasses represent requested changes to a namespace.
NamespaceChange subclasses represent requested changes to a namespace. These are passed to
SupportsNamespaces#alterNamespace. For example,import NamespaceChange._ val catalog = Catalogs.load(name) catalog.alterNamespace(ident, setProperty("prop", "value"), removeProperty("other_prop") )- Annotations
- @Evolving()
- Since
3.0.0
- trait ProcedureCatalog extends CatalogPlugin
A catalog API for working with procedures.
A catalog API for working with procedures.
- Annotations
- @Evolving()
- Since
4.0.0
- trait SessionConfigSupport extends TableProvider
A mix-in interface for
TableProvider.A mix-in interface for
TableProvider. Data sources can implement this interface to propagate session configs with the specified key-prefix to all data source operations in this session.- Annotations
- @Evolving()
- Since
3.0.0
- trait StagedTable extends Table
Represents a table which is staged for being committed to the metastore.
Represents a table which is staged for being committed to the metastore.
This is used to implement atomic CREATE TABLE AS SELECT and REPLACE TABLE AS SELECT queries. The planner will create one of these via
StructType, Transform[], Map)orStructType, Transform[], Map)to prepare the table for being written to. This table should usually implementSupportsWrite. A new writer will be constructed viaSupportsWrite#newWriteBuilder(LogicalWriteInfo), and the write will be committed. The job concludes with a call to#commitStagedChanges(), at which point implementations are expected to commit the table's metadata into the metastore along with the data that was written by the writes from the write builder this table created.- Annotations
- @Evolving()
- Since
3.0.0
- trait StagingTableCatalog extends TableCatalog
An optional mix-in for implementations of
TableCatalogthat support staging creation of the a table before committing the table's metadata along with its contents in CREATE TABLE AS SELECT or REPLACE TABLE AS SELECT operations.An optional mix-in for implementations of
TableCatalogthat support staging creation of the a table before committing the table's metadata along with its contents in CREATE TABLE AS SELECT or REPLACE TABLE AS SELECT operations.It is highly recommended to implement this trait whenever possible so that CREATE TABLE AS SELECT and REPLACE TABLE AS SELECT operations are atomic. For example, when one runs a REPLACE TABLE AS SELECT operation, if the catalog does not implement this trait, the planner will first drop the table via
TableCatalog#dropTable(Identifier), then create the table viaColumn[], Transform[], Map), and then perform the write viaSupportsWrite#newWriteBuilder(LogicalWriteInfo). However, if the write operation fails, the catalog will have already dropped the table, and the planner cannot roll back the dropping of the table.If the catalog implements this plugin, the catalog can implement the methods to "stage" the creation and the replacement of a table. After the table's
BatchWrite#commit(WriterCommitMessage[])is called,StagedTable#commitStagedChanges()is called, at which point the staged table can complete both the data write and the metadata swap operation atomically.- Annotations
- @Evolving()
- Since
3.0.0
- trait SupportsAtomicPartitionManagement extends SupportsPartitionManagement
An atomic partition interface of
Tableto operate multiple partitions atomically.An atomic partition interface of
Tableto operate multiple partitions atomically.These APIs are used to modify table partition or partition metadata, they will change the table data as well.
#createPartitions: add an array of partitions and any data they contain to the table#dropPartitions: remove an array of partitions and any data they contain from the table#purgePartitions: remove an array of partitions and any data they contain from the table by skipping a trash even if it is supported#truncatePartitions: truncate an array of partitions by removing partitions data
- Annotations
- @Experimental()
- Since
3.1.0
- trait SupportsCatalogOptions extends TableProvider
An interface, which TableProviders can implement, to support table existence checks and creation through a catalog, without having to use table identifiers.
An interface, which TableProviders can implement, to support table existence checks and creation through a catalog, without having to use table identifiers. For example, when file based data sources use the
DataFrameWriter.save(path)method, the optionpathcan translate to a PathIdentifier. A catalog can then use this PathIdentifier to check the existence of a table, or whether a table can be created at a given directory.- Annotations
- @Evolving()
- Since
3.0.0
- trait SupportsDelete extends SupportsDeleteV2
A mix-in interface for
Tabledelete support.A mix-in interface for
Tabledelete support. Data sources can implement this interface to provide the ability to delete data from tables that matches filter expressions.- Annotations
- @Evolving()
- Since
3.0.0
- trait SupportsDeleteV2 extends TruncatableTable
A mix-in interface for
Tabledelete support.A mix-in interface for
Tabledelete support. Data sources can implement this interface to provide the ability to delete data from tables that matches filter expressions.- Annotations
- @Evolving()
- Since
3.4.0
- trait SupportsMetadataColumns extends Table
An interface for exposing data columns for a table that are not in the table schema.
An interface for exposing data columns for a table that are not in the table schema. For example, a file source could expose a "file" column that contains the path of the file that contained each row.
The columns returned by
#metadataColumns()may be passed asStructFieldin requested projections. Sources that implement this interface and column projection usingSupportsPushDownRequiredColumnsmust accept metadata fields passed toSupportsPushDownRequiredColumns#pruneColumns(StructType).If a table column and a metadata column have the same name, the conflict is resolved by either renaming or suppressing the metadata column. See
canRenameConflictingMetadataColumns.- Annotations
- @Evolving()
- Since
3.1.0
- trait SupportsNamespaces extends CatalogPlugin
Catalog methods for working with namespaces.
Catalog methods for working with namespaces.
If an object such as a table, view, or function exists, its parent namespaces must also exist and must be returned by the discovery methods
#listNamespaces()and#listNamespaces(String[]).Catalog implementations are not required to maintain the existence of namespaces independent of objects in a namespace. For example, a function catalog that loads functions using reflection and uses Java packages as namespaces is not required to support the methods to create, alter, or drop a namespace. Implementations are allowed to discover the existence of objects or namespaces without throwing
NoSuchNamespaceExceptionwhen no namespace is found.- Annotations
- @Evolving()
- Since
3.0.0
- trait SupportsPartitionManagement extends Table
A partition interface of
Table.A partition interface of
Table. A partition is composed of identifier and properties, and properties contains metadata information of the partition.These APIs are used to modify table partition identifier or partition metadata. In some cases, they will change the table data as well.
#createPartition: add a partition and any data it contains to the table#dropPartition: remove a partition and any data it contains from the table#purgePartition: remove a partition and any data it contains from the table by skipping a trash even if it is supported.#replacePartitionMetadata: point a partition to a new location, which will swap one location's data for the other#truncatePartition: remove partition data from the table
- Annotations
- @Experimental()
- Since
3.1.0
- trait SupportsRead extends Table
A mix-in interface of
Table, to indicate that it's readable.A mix-in interface of
Table, to indicate that it's readable. This adds#newScanBuilder(CaseInsensitiveStringMap)that is used to create a scan for batch, micro-batch, or continuous processing.- Annotations
- @Evolving()
- Since
3.0.0
- trait SupportsRowLevelOperations extends Table
A mix-in interface for
Tablerow-level operations support.A mix-in interface for
Tablerow-level operations support. Data sources can implement this interface to indicate they support rewriting data for DELETE, UPDATE, MERGE operations.- Annotations
- @Experimental()
- Since
3.3.0
- trait SupportsWrite extends Table
A mix-in interface of
Table, to indicate that it's writable.A mix-in interface of
Table, to indicate that it's writable. This adds#newWriteBuilder(LogicalWriteInfo)that is used to create a write for batch or streaming.- Annotations
- @Evolving()
- Since
3.0.0
- trait Table extends AnyRef
An interface representing a logical structured data set of a data source.
An interface representing a logical structured data set of a data source. For example, the implementation can be a directory on the file system, a topic of Kafka, or a table in the catalog, etc.
This interface can mixin
SupportsReadandSupportsWriteto provide data reading and writing ability.The default implementation of
#partitioning()returns an empty array of partitions, and the default implementation of#properties()returns an empty map. These should be overridden by implementations that support partitioning and table properties.- Annotations
- @Evolving()
- Since
3.0.0
- sealed final class TableCapability extends Enum[TableCapability]
Capabilities that can be provided by a
Tableimplementation.Capabilities that can be provided by a
Tableimplementation.Tables use
Table#capabilities()to return a set of capabilities. Each capability signals to Spark that the table supports a feature identified by the capability. For example, returning#BATCH_READallows Spark to read from the table using a batch scan.- Annotations
- @Evolving()
- Since
3.0.0
- trait TableCatalog extends CatalogPlugin
Catalog methods for working with Tables.
Catalog methods for working with Tables.
TableCatalog implementations may be case sensitive or case insensitive. Spark will pass
table identifierswithout modification. Field names passed toTableChange...)will be normalized to match the case used in the table schema when updating, renaming, or dropping existing columns when catalyst analysis is case insensitive.- Annotations
- @Evolving()
- Since
3.0.0
- sealed final class TableCatalogCapability extends Enum[TableCatalogCapability]
Capabilities that can be provided by a
TableCatalogimplementation.Capabilities that can be provided by a
TableCatalogimplementation.TableCatalogs use
TableCatalog#capabilities()to return a set of capabilities. Each capability signals to Spark that the catalog supports a feature identified by the capability. For example, returning#SUPPORTS_CREATE_TABLE_WITH_GENERATED_COLUMNSallows Spark to acceptGENERATED ALWAYS ASexpressions inCREATE TABLEstatements.- Annotations
- @Evolving()
- Since
3.4.0
- trait TableChange extends AnyRef
TableChange subclasses represent requested changes to a table.
TableChange subclasses represent requested changes to a table. These are passed to
TableCatalog#alterTable. For example,import TableChange._ val catalog = Catalogs.load(name) catalog.asTableCatalog.alterTable(ident, addColumn("x", IntegerType), renameColumn("a", "b"), deleteColumn("c") )- Annotations
- @Evolving()
- Since
3.0.0
- trait TableProvider extends AnyRef
The base interface for v2 data sources which don't have a real catalog.
The base interface for v2 data sources which don't have a real catalog. Implementations must have a public, 0-arg constructor.
Note that, TableProvider can only apply data operations to existing tables, like read, append, delete, and overwrite. It does not support the operations that require metadata changes, like create/drop tables.
The major responsibility of this interface is to return a
Tablefor read/write.- Annotations
- @Evolving()
- Since
3.0.0
- sealed final class TableWritePrivilege extends Enum[TableWritePrivilege]
The table write privileges that will be provided when loading a table.
The table write privileges that will be provided when loading a table.
- Since
3.5.3
- trait TruncatableTable extends Table
Represents a table which can be atomically truncated.
Represents a table which can be atomically truncated.
- Annotations
- @Evolving()
- Since
3.2.0
- trait View extends AnyRef
An interface representing a persisted view.
An interface representing a persisted view.
- Annotations
- @DeveloperApi()
- trait ViewCatalog extends CatalogPlugin
Catalog methods for working with views.
Catalog methods for working with views.
- Annotations
- @DeveloperApi()
- trait ViewChange extends AnyRef
ViewChange subclasses represent requested changes to a view.
ViewChange subclasses represent requested changes to a view. These are passed to
ViewCatalog#alterView.- Annotations
- @DeveloperApi()
- class ViewInfo extends AnyRef
A class that holds view information.
A class that holds view information.
- Annotations
- @DeveloperApi()