public interface HasPartitionKey extends InputPartition
SupportsReportPartitioning, see below). Data sources can opt-in to implement this interface for the partitions they report to Spark, which will use the information to avoid data shuffling in certain scenarios, such as join, aggregate, etc. Note that Spark requires ALL input partitions to implement this interface, otherwise it can't take advantage of it.
This interface should be used in combination with
allows data sources to report distribution and ordering spec to Spark. In particular, Spark
expects data sources to report
ClusteredDistribution whenever its input
partitions implement this interface. Spark derives partition keys spec (e.g., column names,
transforms) from the distribution, and partition values from the input partitions.
It is implementor's responsibility to ensure that when an input partition implements this interface, its records all have the same value for the partition keys. Spark doesn't check this property.
|Modifier and Type||Method and Description|
Returns the value of the partition key(s) associated to this partition.