@Experimental public interface SupportsRuntimeFiltering extends SupportsRuntimeV2Filtering
Scan
. Data sources can implement this interface if they can
filter initially planned InputPartition
s using predicates Spark infers at runtime.
Note that Spark will push runtime filters only if they are beneficial.
Modifier and Type | Method and Description |
---|---|
void |
filter(Filter[] filters)
Filters this scan using runtime filters.
|
default void |
filter(Predicate[] predicates)
Filters this scan using runtime predicates.
|
NamedReference[] |
filterAttributes()
Returns attributes this scan can be filtered by at runtime.
|
description, readSchema, reportDriverMetrics, supportedCustomMetrics, toBatch, toContinuousStream, toMicroBatchStream
NamedReference[] filterAttributes()
Spark will call filter(Filter[])
if it can derive a runtime
predicate for any of the filter attributes.
filterAttributes
in interface SupportsRuntimeV2Filtering
void filter(Filter[] filters)
The provided expressions must be interpreted as a set of filters that are ANDed together.
Implementations may use the filters to prune initially planned InputPartition
s.
If the scan also implements SupportsReportPartitioning
, it must preserve
the originally reported partitioning during runtime filtering. While applying runtime filters,
the scan may detect that some InputPartition
s have no matching data. It can omit
such partitions entirely only if it does not report a specific partitioning. Otherwise,
the scan can replace the initially planned InputPartition
s that have no matching
data with empty InputPartition
s but must preserve the overall number of partitions.
Note that Spark will call Scan.toBatch()
again after filtering the scan at runtime.
filters
- data source filters used to filter the scan at runtimedefault void filter(Predicate[] predicates)
SupportsRuntimeV2Filtering
The provided expressions must be interpreted as a set of predicates that are ANDed together.
Implementations may use the predicates to prune initially planned InputPartition
s.
If the scan also implements SupportsReportPartitioning
, it must preserve
the originally reported partitioning during runtime filtering. While applying runtime
predicates, the scan may detect that some InputPartition
s have no matching data. It
can omit such partitions entirely only if it does not report a specific partitioning.
Otherwise, the scan can replace the initially planned InputPartition
s that have no
matching data with empty InputPartition
s but must preserve the overall number of
partitions.
Note that Spark will call Scan.toBatch()
again after filtering the scan at runtime.
filter
in interface SupportsRuntimeV2Filtering
predicates
- data source V2 predicates used to filter the scan at runtime