Interface TableProvider

All Known Subinterfaces:
SessionConfigSupport, SupportsCatalogOptions

@Evolving public interface TableProvider
The base interface for v2 data sources which don't have a real catalog. Implementations must have a public, 0-arg constructor.

Note that, TableProvider can only apply data operations to existing tables, like read, append, delete, and overwrite. It does not support the operations that require metadata changes, like create/drop tables.

The major responsibility of this interface is to return a Table for read/write.

Since:
3.0.0
  • Method Details

    • inferSchema

      StructType inferSchema(CaseInsensitiveStringMap options)
      Infer the schema of the table identified by the given options.
      Parameters:
      options - an immutable case-insensitive string-to-string map that can identify a table, e.g. file path, Kafka topic name, etc.
    • inferPartitioning

      default Transform[] inferPartitioning(CaseInsensitiveStringMap options)
      Infer the partitioning of the table identified by the given options.

      By default this method returns empty partitioning, please override it if this source support partitioning.

      Parameters:
      options - an immutable case-insensitive string-to-string map that can identify a table, e.g. file path, Kafka topic name, etc.
    • getTable

      Table getTable(StructType schema, Transform[] partitioning, Map<String,String> properties)
      Return a Table instance with the specified table schema, partitioning and properties to do read/write. The returned table should report the same schema and partitioning with the specified ones, or Spark may fail the operation.
      Parameters:
      schema - The specified table schema.
      partitioning - The specified table partitioning.
      properties - The specified table properties. It's case preserving (contains exactly what users specified) and implementations are free to use it case sensitively or insensitively. It should be able to identify a table, e.g. file path, Kafka topic name, etc.
    • supportsExternalMetadata

      default boolean supportsExternalMetadata()
      Returns true if the source has the ability of accepting external table metadata when getting tables. The external table metadata includes:
      1. For table reader: user-specified schema from DataFrameReader/ DataStreamReader and schema/partitioning stored in Spark catalog.
      2. For table writer: the schema of the input Dataframe of DataframeWriter/DataStreamWriter.

      By default this method returns false, which means the schema and partitioning passed to getTable(StructType, Transform[], Map) are from the infer methods. Please override it if this source has expensive schema/partitioning inference and wants external table metadata to avoid inference.