@InterfaceStability.Evolving public interface SupportsScanColumnarBatch extends DataSourceReader
DataSourceReader
. Data source readers can implement this
interface to output ColumnarBatch
and make the scan faster.Modifier and Type | Method and Description |
---|---|
default boolean |
enableBatchRead()
Returns true if the concrete data source reader can read data in batch according to the scan
properties like required columns, pushes filters, etc.
|
java.util.List<InputPartition<ColumnarBatch>> |
planBatchInputPartitions()
Similar to
DataSourceReader.planInputPartitions() , but returns columnar data
in batches. |
default java.util.List<InputPartition<org.apache.spark.sql.catalyst.InternalRow>> |
planInputPartitions()
Returns a list of
InputPartition s. |
readSchema
default java.util.List<InputPartition<org.apache.spark.sql.catalyst.InternalRow>> planInputPartitions()
DataSourceReader
InputPartition
s. Each InputPartition
is responsible for
creating a data reader to output data of one RDD partition. The number of input partitions
returned here is the same as the number of RDD partitions this scan outputs.
Note that, this may not be a full scan if the data source reader mixes in other optimization
interfaces like column pruning, filter push-down, etc. These optimizations are applied before
Spark issues the scan request.
If this method fails (by throwing an exception), the action will fail and no Spark job will be
submitted.planInputPartitions
in interface DataSourceReader
java.util.List<InputPartition<ColumnarBatch>> planBatchInputPartitions()
DataSourceReader.planInputPartitions()
, but returns columnar data
in batches.default boolean enableBatchRead()
planInputPartitions()
to fallback to normal read path under some conditions.