DataSourceReader (Spark 2.3.1 JavaDoc)

All Known Subinterfaces:

ContinuousReader, MicroBatchReader, SupportsPushDownCatalystFilters, SupportsPushDownFilters, SupportsPushDownRequiredColumns, SupportsReportPartitioning, SupportsReportStatistics, SupportsScanColumnarBatch, SupportsScanUnsafeRow
```
@InterfaceStability.Evolving
public interface DataSourceReader
```
A data source reader that is returned by ReadSupport.createReader(DataSourceOptions) or ReadSupportWithSchema.createReader(StructType, DataSourceOptions). It can mix in various query optimization interfaces to speed up the data scan. The actual scan logic is delegated to DataReaderFactorys that are returned by createDataReaderFactories(). There are mainly 3 kinds of query optimizations: 1. Operators push-down. E.g., filter push-down, required columns push-down(aka column pruning), etc. Names of these interfaces start with `SupportsPushDown`. 2. Information Reporting. E.g., statistics reporting, ordering reporting, etc. Names of these interfaces start with `SupportsReporting`. 3. Special scans. E.g, columnar scan, unsafe row scan, etc. Names of these interfaces start with `SupportsScan`. Note that a reader should only implement at most one of the special scans, if more than one special scans are implemented, only one of them would be respected, according to the priority list from high to low: SupportsScanColumnarBatch, SupportsScanUnsafeRow. If an exception was throw when applying any of these query optimizations, the action would fail and no Spark job was submitted. Spark first applies all operator push-down optimizations that this data source supports. Then Spark collects information this data source reported for further optimizations. Finally Spark issues the scan request and does the actual data reading.

Method Summary

All Methods Instance Methods Abstract Methods
Modifier and Type	Method and Description
`java.util.List<DataReaderFactory<Row>>`	`createDataReaderFactories()` Returns a list of reader factories.
`StructType`	`readSchema()` Returns the actual schema of this data source reader, which may be different from the physical schema of the underlying storage, as column pruning or other optimizations may happen.

- Method Detail
  - readSchema
```
StructType readSchema()
```
    Returns the actual schema of this data source reader, which may be different from the physical schema of the underlying storage, as column pruning or other optimizations may happen. If this method fails (by throwing an exception), the action would fail and no Spark job was submitted.
  - createDataReaderFactories
```
java.util.List<DataReaderFactory<Row>> createDataReaderFactories()
```
    Returns a list of reader factories. Each factory is responsible for creating a data reader to output data for one RDD partition. That means the number of factories returned here is same as the number of RDD partitions this scan outputs. Note that, this may not be a full scan if the data source reader mixes in other optimization interfaces like column pruning, filter push-down, etc. These optimizations are applied before Spark issues the scan request. If this method fails (by throwing an exception), the action would fail and no Spark job was submitted.

Interface DataSourceReader

Method Summary

Method Detail

readSchema

createDataReaderFactories