public class JavaRDD<T>
extends Object
Constructor and Description |
---|
JavaRDD(RDD<T> rdd,
scala.reflect.ClassTag<T> classTag) |
Modifier and Type | Method and Description |
---|---|
JavaRDD<T> |
cache()
Persist this RDD with the default storage level (
MEMORY_ONLY ). |
scala.reflect.ClassTag<T> |
classTag() |
JavaRDD<T> |
coalesce(int numPartitions)
Return a new RDD that is reduced into
numPartitions partitions. |
JavaRDD<T> |
coalesce(int numPartitions,
boolean shuffle)
Return a new RDD that is reduced into
numPartitions partitions. |
JavaRDD<T> |
distinct()
Return a new RDD containing the distinct elements in this RDD.
|
JavaRDD<T> |
distinct(int numPartitions)
Return a new RDD containing the distinct elements in this RDD.
|
JavaRDD<T> |
filter(Function<T,Boolean> f)
Return a new RDD containing only the elements that satisfy a predicate.
|
static <T> JavaRDD<T> |
fromRDD(RDD<T> rdd,
scala.reflect.ClassTag<T> evidence$1) |
ResourceProfile |
getResourceProfile()
Get the ResourceProfile specified with this RDD or None if it wasn't specified.
|
JavaRDD<T> |
intersection(JavaRDD<T> other)
Return the intersection of this RDD and another one.
|
JavaRDD<T> |
persist(StorageLevel newLevel)
Set this RDD's storage level to persist its values across operations after the first time
it is computed.
|
JavaRDD<T>[] |
randomSplit(double[] weights)
Randomly splits this RDD with the provided weights.
|
JavaRDD<T>[] |
randomSplit(double[] weights,
long seed)
Randomly splits this RDD with the provided weights.
|
RDD<T> |
rdd() |
JavaRDD<T> |
repartition(int numPartitions)
Return a new RDD that has exactly numPartitions partitions.
|
JavaRDD<T> |
sample(boolean withReplacement,
double fraction)
Return a sampled subset of this RDD with a random seed.
|
JavaRDD<T> |
sample(boolean withReplacement,
double fraction,
long seed)
Return a sampled subset of this RDD, with a user-supplied seed.
|
JavaRDD<T> |
setName(String name)
Assign a name to this RDD
|
<S> JavaRDD<T> |
sortBy(Function<T,S> f,
boolean ascending,
int numPartitions)
Return this RDD sorted by the given key function.
|
JavaRDD<T> |
subtract(JavaRDD<T> other)
Return an RDD with the elements from
this that are not in other . |
JavaRDD<T> |
subtract(JavaRDD<T> other,
int numPartitions)
Return an RDD with the elements from
this that are not in other . |
JavaRDD<T> |
subtract(JavaRDD<T> other,
Partitioner p)
Return an RDD with the elements from
this that are not in other . |
static <T> RDD<T> |
toRDD(JavaRDD<T> rdd) |
String |
toString() |
JavaRDD<T> |
union(JavaRDD<T> other)
Return the union of this RDD and another one.
|
JavaRDD<T> |
unpersist()
Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
|
JavaRDD<T> |
unpersist(boolean blocking)
Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
|
JavaRDD<T> |
withResources(ResourceProfile rp)
Specify a ResourceProfile to use when calculating this RDD.
|
JavaRDD<T> |
wrapRDD(RDD<T> rdd) |
aggregate, cartesian, checkpoint, collect, collectAsync, collectPartitions, context, count, countApprox, countApprox, countApproxDistinct, countAsync, countByValue, countByValueApprox, countByValueApprox, first, flatMap, flatMapToDouble, flatMapToPair, fold, foreach, foreachAsync, foreachPartition, foreachPartitionAsync, getCheckpointFile, getNumPartitions, getStorageLevel, glom, groupBy, groupBy, id, isCheckpointed, isEmpty, iterator, keyBy, map, mapPartitions, mapPartitions, mapPartitionsToDouble, mapPartitionsToDouble, mapPartitionsToPair, mapPartitionsToPair, mapPartitionsWithIndex, mapToDouble, mapToPair, max, min, name, partitioner, partitions, pipe, pipe, pipe, pipe, pipe, reduce, saveAsObjectFile, saveAsTextFile, saveAsTextFile, take, takeAsync, takeOrdered, takeOrdered, takeSample, takeSample, toDebugString, toLocalIterator, top, top, treeAggregate, treeAggregate, treeAggregate, treeReduce, treeReduce, zip, zipPartitions, zipWithIndex, zipWithUniqueId
public scala.reflect.ClassTag<T> classTag()
public JavaRDD<T> cache()
MEMORY_ONLY
).public JavaRDD<T> persist(StorageLevel newLevel)
newLevel
- (undocumented)public JavaRDD<T> withResources(ResourceProfile rp)
rp
- (undocumented)public ResourceProfile getResourceProfile()
public JavaRDD<T> unpersist()
public JavaRDD<T> unpersist(boolean blocking)
blocking
- Whether to block until all blocks are deleted.public JavaRDD<T> distinct()
public JavaRDD<T> distinct(int numPartitions)
numPartitions
- (undocumented)public JavaRDD<T> filter(Function<T,Boolean> f)
f
- (undocumented)public JavaRDD<T> coalesce(int numPartitions)
numPartitions
partitions.numPartitions
- (undocumented)public JavaRDD<T> coalesce(int numPartitions, boolean shuffle)
numPartitions
partitions.numPartitions
- (undocumented)shuffle
- (undocumented)public JavaRDD<T> repartition(int numPartitions)
Can increase or decrease the level of parallelism in this RDD. Internally, this uses a shuffle to redistribute data.
If you are decreasing the number of partitions in this RDD, consider using coalesce
,
which can avoid performing a shuffle.
numPartitions
- (undocumented)public JavaRDD<T> sample(boolean withReplacement, double fraction)
withReplacement
- can elements be sampled multiple times (replaced when sampled out)fraction
- expected size of the sample as a fraction of this RDD's size
without replacement: probability that each element is chosen; fraction must be [0, 1]
with replacement: expected number of times each element is chosen; fraction must be greater
than or equal to 0
RDD
.public JavaRDD<T> sample(boolean withReplacement, double fraction, long seed)
withReplacement
- can elements be sampled multiple times (replaced when sampled out)fraction
- expected size of the sample as a fraction of this RDD's size
without replacement: probability that each element is chosen; fraction must be [0, 1]
with replacement: expected number of times each element is chosen; fraction must be greater
than or equal to 0seed
- seed for the random number generator
RDD
.public JavaRDD<T>[] randomSplit(double[] weights)
weights
- weights for splits, will be normalized if they don't sum to 1
public JavaRDD<T>[] randomSplit(double[] weights, long seed)
weights
- weights for splits, will be normalized if they don't sum to 1seed
- random seed
public JavaRDD<T> union(JavaRDD<T> other)
.distinct()
to eliminate them).other
- (undocumented)public JavaRDD<T> intersection(JavaRDD<T> other)
other
- (undocumented)public JavaRDD<T> subtract(JavaRDD<T> other)
this
that are not in other
.
Uses this
partitioner/partition size, because even if other
is huge, the resulting
RDD will be less than or equal to us.
other
- (undocumented)public JavaRDD<T> subtract(JavaRDD<T> other, int numPartitions)
this
that are not in other
.other
- (undocumented)numPartitions
- (undocumented)public JavaRDD<T> subtract(JavaRDD<T> other, Partitioner p)
this
that are not in other
.other
- (undocumented)p
- (undocumented)public String toString()
toString
in class Object