Package org.apache.spark
Class RangePartitioner<K,V>
Object
org.apache.spark.Partitioner
org.apache.spark.RangePartitioner<K,V>
- All Implemented Interfaces:
Serializable
A
Partitioner that partitions sortable records by range into roughly
equal ranges. The ranges are determined by sampling the content of the RDD passed in.
- See Also:
- Note:
- The actual number of partitions created by the RangePartitioner might not be the same
as the
partitionsparameter, in the case where the number of sampled records is less than the value ofpartitions.
-
Constructor Summary
ConstructorsConstructorDescriptionRangePartitioner(int partitions, RDD<? extends scala.Product2<K, V>> rdd, boolean ascending, int samplePointsPerPartitionHint, scala.math.Ordering<K> evidence$1, scala.reflect.ClassTag<K> evidence$2) RangePartitioner(int partitions, RDD<? extends scala.Product2<K, V>> rdd, boolean ascending, scala.math.Ordering<K> evidence$3, scala.reflect.ClassTag<K> evidence$4) -
Method Summary
Modifier and TypeMethodDescriptionstatic <K> ObjectdetermineBounds(scala.collection.mutable.ArrayBuffer<scala.Tuple2<K, Object>> candidates, int partitions, scala.math.Ordering<K> evidence$6, scala.reflect.ClassTag<K> evidence$7) Determines the bounds for range partitioning from candidates with weights indicating how many items each represents.booleanintgetPartition(Object key) inthashCode()intintSketches the input RDD via reservoir sampling on each partition.Methods inherited from class org.apache.spark.Partitioner
defaultPartitioner
-
Constructor Details
-
RangePartitioner
-
RangePartitioner
-
-
Method Details
-
sketch
public static <K> scala.Tuple2<Object,scala.Tuple3<Object, sketchObject, Object>[]> (RDD<K> rdd, int sampleSizePerPartition, scala.reflect.ClassTag<K> evidence$5) Sketches the input RDD via reservoir sampling on each partition.- Parameters:
rdd- the input RDD to sketchsampleSizePerPartition- max sample size per partitionevidence$5- (undocumented)- Returns:
- (total number of items, an array of (partitionId, number of items, sample))
-
determineBounds
public static <K> Object determineBounds(scala.collection.mutable.ArrayBuffer<scala.Tuple2<K, Object>> candidates, int partitions, scala.math.Ordering<K> evidence$6, scala.reflect.ClassTag<K> evidence$7) Determines the bounds for range partitioning from candidates with weights indicating how many items each represents. Usually this is 1 over the probability used to sample this candidate.- Parameters:
candidates- unordered candidates with weightspartitions- number of partitionsevidence$6- (undocumented)evidence$7- (undocumented)- Returns:
- selected bounds
-
samplePointsPerPartitionHint
public int samplePointsPerPartitionHint() -
numPartitions
public int numPartitions()- Specified by:
numPartitionsin classPartitioner
-
getPartition
- Specified by:
getPartitionin classPartitioner
-
equals
-
hashCode
public int hashCode()
-