public interface JavaRDDLike<T,This extends JavaRDDLike<T,This>>
extends scala.Serializable
Modifier and Type | Method and Description |
---|---|
<U> U |
aggregate(U zeroValue,
Function2<U,T,U> seqOp,
Function2<U,U,U> combOp)
Aggregate the elements of each partition, and then the results for all the partitions, using
given combine functions and a neutral "zero value".
|
<U> JavaPairRDD<T,U> |
cartesian(JavaRDDLike<U,?> other)
Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of
elements (a, b) where a is in
this and b is in other . |
void |
checkpoint()
Mark this RDD for checkpointing.
|
scala.reflect.ClassTag<T> |
classTag() |
java.util.List<T> |
collect()
Return an array that contains all of the elements in this RDD.
|
JavaFutureAction<java.util.List<T>> |
collectAsync()
The asynchronous version of
collect , which returns a future for
retrieving an array containing all of the elements in this RDD. |
java.util.List<T>[] |
collectPartitions(int[] partitionIds)
Return an array that contains all of the elements in a specific partition of this RDD.
|
SparkContext |
context()
The
SparkContext that this RDD was created on. |
long |
count()
Return the number of elements in the RDD.
|
PartialResult<BoundedDouble> |
countApprox(long timeout)
Approximate version of count() that returns a potentially incomplete result
within a timeout, even if not all tasks have finished.
|
PartialResult<BoundedDouble> |
countApprox(long timeout,
double confidence)
Approximate version of count() that returns a potentially incomplete result
within a timeout, even if not all tasks have finished.
|
long |
countApproxDistinct(double relativeSD)
Return approximate number of distinct elements in the RDD.
|
JavaFutureAction<Long> |
countAsync()
The asynchronous version of
count , which returns a
future for counting the number of elements in this RDD. |
java.util.Map<T,Long> |
countByValue()
Return the count of each unique value in this RDD as a map of (value, count) pairs.
|
PartialResult<java.util.Map<T,BoundedDouble>> |
countByValueApprox(long timeout)
Approximate version of countByValue().
|
PartialResult<java.util.Map<T,BoundedDouble>> |
countByValueApprox(long timeout,
double confidence)
Approximate version of countByValue().
|
T |
first()
Return the first element in this RDD.
|
<U> JavaRDD<U> |
flatMap(FlatMapFunction<T,U> f)
Return a new RDD by first applying a function to all elements of this
RDD, and then flattening the results.
|
JavaDoubleRDD |
flatMapToDouble(DoubleFlatMapFunction<T> f)
Return a new RDD by first applying a function to all elements of this
RDD, and then flattening the results.
|
<K2,V2> JavaPairRDD<K2,V2> |
flatMapToPair(PairFlatMapFunction<T,K2,V2> f)
Return a new RDD by first applying a function to all elements of this
RDD, and then flattening the results.
|
T |
fold(T zeroValue,
Function2<T,T,T> f)
Aggregate the elements of each partition, and then the results for all the partitions, using a
given associative function and a neutral "zero value".
|
void |
foreach(VoidFunction<T> f)
Applies a function f to all elements of this RDD.
|
JavaFutureAction<Void> |
foreachAsync(VoidFunction<T> f)
The asynchronous version of the
foreach action, which
applies a function f to all the elements of this RDD. |
void |
foreachPartition(VoidFunction<java.util.Iterator<T>> f)
Applies a function f to each partition of this RDD.
|
JavaFutureAction<Void> |
foreachPartitionAsync(VoidFunction<java.util.Iterator<T>> f)
The asynchronous version of the
foreachPartition action, which
applies a function f to each partition of this RDD. |
Optional<String> |
getCheckpointFile()
Gets the name of the file to which this RDD was checkpointed
|
int |
getNumPartitions()
Return the number of partitions in this RDD.
|
StorageLevel |
getStorageLevel()
Get the RDD's current storage level, or StorageLevel.NONE if none is set.
|
JavaRDD<java.util.List<T>> |
glom()
Return an RDD created by coalescing all elements within each partition into an array.
|
<U> JavaPairRDD<U,Iterable<T>> |
groupBy(Function<T,U> f)
Return an RDD of grouped elements.
|
<U> JavaPairRDD<U,Iterable<T>> |
groupBy(Function<T,U> f,
int numPartitions)
Return an RDD of grouped elements.
|
int |
id()
A unique ID for this RDD (within its SparkContext).
|
boolean |
isCheckpointed()
Return whether this RDD has been checkpointed or not
|
boolean |
isEmpty() |
java.util.Iterator<T> |
iterator(Partition split,
TaskContext taskContext)
Internal method to this RDD; will read from cache if applicable, or otherwise compute it.
|
<U> JavaPairRDD<U,T> |
keyBy(Function<T,U> f)
Creates tuples of the elements in this RDD by applying
f . |
<R> JavaRDD<R> |
map(Function<T,R> f)
Return a new RDD by applying a function to all elements of this RDD.
|
<U> JavaRDD<U> |
mapPartitions(FlatMapFunction<java.util.Iterator<T>,U> f)
Return a new RDD by applying a function to each partition of this RDD.
|
<U> JavaRDD<U> |
mapPartitions(FlatMapFunction<java.util.Iterator<T>,U> f,
boolean preservesPartitioning)
Return a new RDD by applying a function to each partition of this RDD.
|
JavaDoubleRDD |
mapPartitionsToDouble(DoubleFlatMapFunction<java.util.Iterator<T>> f)
Return a new RDD by applying a function to each partition of this RDD.
|
JavaDoubleRDD |
mapPartitionsToDouble(DoubleFlatMapFunction<java.util.Iterator<T>> f,
boolean preservesPartitioning)
Return a new RDD by applying a function to each partition of this RDD.
|
<K2,V2> JavaPairRDD<K2,V2> |
mapPartitionsToPair(PairFlatMapFunction<java.util.Iterator<T>,K2,V2> f)
Return a new RDD by applying a function to each partition of this RDD.
|
<K2,V2> JavaPairRDD<K2,V2> |
mapPartitionsToPair(PairFlatMapFunction<java.util.Iterator<T>,K2,V2> f,
boolean preservesPartitioning)
Return a new RDD by applying a function to each partition of this RDD.
|
<R> JavaRDD<R> |
mapPartitionsWithIndex(Function2<Integer,java.util.Iterator<T>,java.util.Iterator<R>> f,
boolean preservesPartitioning)
Return a new RDD by applying a function to each partition of this RDD, while tracking the index
of the original partition.
|
<R> JavaDoubleRDD |
mapToDouble(DoubleFunction<T> f)
Return a new RDD by applying a function to all elements of this RDD.
|
<K2,V2> JavaPairRDD<K2,V2> |
mapToPair(PairFunction<T,K2,V2> f)
Return a new RDD by applying a function to all elements of this RDD.
|
T |
max(java.util.Comparator<T> comp)
Returns the maximum element from this RDD as defined by the specified
Comparator[T].
|
T |
min(java.util.Comparator<T> comp)
Returns the minimum element from this RDD as defined by the specified
Comparator[T].
|
String |
name() |
Optional<Partitioner> |
partitioner()
The partitioner of this RDD.
|
java.util.List<Partition> |
partitions()
Set of partitions in this RDD.
|
JavaRDD<String> |
pipe(java.util.List<String> command)
Return an RDD created by piping elements to a forked external process.
|
JavaRDD<String> |
pipe(java.util.List<String> command,
java.util.Map<String,String> env)
Return an RDD created by piping elements to a forked external process.
|
JavaRDD<String> |
pipe(java.util.List<String> command,
java.util.Map<String,String> env,
boolean separateWorkingDir,
int bufferSize)
Return an RDD created by piping elements to a forked external process.
|
JavaRDD<String> |
pipe(java.util.List<String> command,
java.util.Map<String,String> env,
boolean separateWorkingDir,
int bufferSize,
String encoding)
Return an RDD created by piping elements to a forked external process.
|
JavaRDD<String> |
pipe(String command)
Return an RDD created by piping elements to a forked external process.
|
RDD<T> |
rdd() |
T |
reduce(Function2<T,T,T> f)
Reduces the elements of this RDD using the specified commutative and associative binary
operator.
|
void |
saveAsObjectFile(String path)
Save this RDD as a SequenceFile of serialized objects.
|
void |
saveAsTextFile(String path)
Save this RDD as a text file, using string representations of elements.
|
void |
saveAsTextFile(String path,
Class<? extends org.apache.hadoop.io.compress.CompressionCodec> codec)
Save this RDD as a compressed text file, using string representations of elements.
|
java.util.List<T> |
take(int num)
Take the first num elements of the RDD.
|
JavaFutureAction<java.util.List<T>> |
takeAsync(int num)
The asynchronous version of the
take action, which returns a
future for retrieving the first num elements of this RDD. |
java.util.List<T> |
takeOrdered(int num)
Returns the first k (smallest) elements from this RDD using the
natural ordering for T while maintain the order.
|
java.util.List<T> |
takeOrdered(int num,
java.util.Comparator<T> comp)
Returns the first k (smallest) elements from this RDD as defined by
the specified Comparator[T] and maintains the order.
|
java.util.List<T> |
takeSample(boolean withReplacement,
int num) |
java.util.List<T> |
takeSample(boolean withReplacement,
int num,
long seed) |
String |
toDebugString()
A description of this RDD and its recursive dependencies for debugging.
|
java.util.Iterator<T> |
toLocalIterator()
Return an iterator that contains all of the elements in this RDD.
|
java.util.List<T> |
top(int num)
Returns the top k (largest) elements from this RDD using the
natural ordering for T and maintains the order.
|
java.util.List<T> |
top(int num,
java.util.Comparator<T> comp)
Returns the top k (largest) elements from this RDD as defined by
the specified Comparator[T] and maintains the order.
|
<U> U |
treeAggregate(U zeroValue,
Function2<U,T,U> seqOp,
Function2<U,U,U> combOp)
org.apache.spark.api.java.JavaRDDLike.treeAggregate with suggested depth 2. |
<U> U |
treeAggregate(U zeroValue,
Function2<U,T,U> seqOp,
Function2<U,U,U> combOp,
int depth)
Aggregates the elements of this RDD in a multi-level tree pattern.
|
<U> U |
treeAggregate(U zeroValue,
Function2<U,T,U> seqOp,
Function2<U,U,U> combOp,
int depth,
boolean finalAggregateOnExecutor)
org.apache.spark.api.java.JavaRDDLike.treeAggregate with a parameter to do the
final aggregation on the executor. |
T |
treeReduce(Function2<T,T,T> f)
org.apache.spark.api.java.JavaRDDLike.treeReduce with suggested depth 2. |
T |
treeReduce(Function2<T,T,T> f,
int depth)
Reduces the elements of this RDD in a multi-level tree pattern.
|
This |
wrapRDD(RDD<T> rdd) |
<U> JavaPairRDD<T,U> |
zip(JavaRDDLike<U,?> other)
Zips this RDD with another one, returning key-value pairs with the first element in each RDD,
second element in each RDD, etc.
|
<U,V> JavaRDD<V> |
zipPartitions(JavaRDDLike<U,?> other,
FlatMapFunction2<java.util.Iterator<T>,java.util.Iterator<U>,V> f)
Zip this RDD's partitions with one (or more) RDD(s) and return a new RDD by
applying a function to the zipped partitions.
|
JavaPairRDD<T,Long> |
zipWithIndex()
Zips this RDD with its element indices.
|
JavaPairRDD<T,Long> |
zipWithUniqueId()
Zips this RDD with generated unique Long ids.
|
<U> U aggregate(U zeroValue, Function2<U,T,U> seqOp, Function2<U,U,U> combOp)
zeroValue
- (undocumented)seqOp
- (undocumented)combOp
- (undocumented)<U> JavaPairRDD<T,U> cartesian(JavaRDDLike<U,?> other)
this
and b is in other
.other
- (undocumented)void checkpoint()
scala.reflect.ClassTag<T> classTag()
java.util.List<T> collect()
JavaFutureAction<java.util.List<T>> collectAsync()
collect
, which returns a future for
retrieving an array containing all of the elements in this RDD.
java.util.List<T>[] collectPartitions(int[] partitionIds)
partitionIds
- (undocumented)SparkContext context()
SparkContext
that this RDD was created on.long count()
PartialResult<BoundedDouble> countApprox(long timeout, double confidence)
The confidence is the probability that the error bounds of the result will contain the true value. That is, if countApprox were called repeatedly with confidence 0.9, we would expect 90% of the results to contain the true count. The confidence must be in the range [0,1] or an exception will be thrown.
timeout
- maximum time to wait for the job, in millisecondsconfidence
- the desired statistical confidence in the resultPartialResult<BoundedDouble> countApprox(long timeout)
timeout
- maximum time to wait for the job, in millisecondslong countApproxDistinct(double relativeSD)
The algorithm used is based on streamlib's implementation of "HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm", available here.
relativeSD
- Relative accuracy. Smaller values create counters that require more space.
It must be greater than 0.000017.JavaFutureAction<Long> countAsync()
count
, which returns a
future for counting the number of elements in this RDD.java.util.Map<T,Long> countByValue()
PartialResult<java.util.Map<T,BoundedDouble>> countByValueApprox(long timeout, double confidence)
The confidence is the probability that the error bounds of the result will contain the true value. That is, if countApprox were called repeatedly with confidence 0.9, we would expect 90% of the results to contain the true count. The confidence must be in the range [0,1] or an exception will be thrown.
timeout
- maximum time to wait for the job, in millisecondsconfidence
- the desired statistical confidence in the resultPartialResult<java.util.Map<T,BoundedDouble>> countByValueApprox(long timeout)
timeout
- maximum time to wait for the job, in millisecondsT first()
<U> JavaRDD<U> flatMap(FlatMapFunction<T,U> f)
f
- (undocumented)JavaDoubleRDD flatMapToDouble(DoubleFlatMapFunction<T> f)
f
- (undocumented)<K2,V2> JavaPairRDD<K2,V2> flatMapToPair(PairFlatMapFunction<T,K2,V2> f)
f
- (undocumented)T fold(T zeroValue, Function2<T,T,T> f)
This behaves somewhat differently from fold operations implemented for non-distributed collections in functional languages like Scala. This fold operation may be applied to partitions individually, and then fold those results into the final result, rather than apply the fold to each element sequentially in some defined ordering. For functions that are not commutative, the result may differ from that of a fold applied to a non-distributed collection.
zeroValue
- (undocumented)f
- (undocumented)void foreach(VoidFunction<T> f)
f
- (undocumented)JavaFutureAction<Void> foreachAsync(VoidFunction<T> f)
foreach
action, which
applies a function f to all the elements of this RDD.f
- (undocumented)void foreachPartition(VoidFunction<java.util.Iterator<T>> f)
f
- (undocumented)JavaFutureAction<Void> foreachPartitionAsync(VoidFunction<java.util.Iterator<T>> f)
foreachPartition
action, which
applies a function f to each partition of this RDD.f
- (undocumented)Optional<String> getCheckpointFile()
int getNumPartitions()
StorageLevel getStorageLevel()
JavaRDD<java.util.List<T>> glom()
<U> JavaPairRDD<U,Iterable<T>> groupBy(Function<T,U> f)
f
- (undocumented)<U> JavaPairRDD<U,Iterable<T>> groupBy(Function<T,U> f, int numPartitions)
f
- (undocumented)numPartitions
- (undocumented)int id()
boolean isCheckpointed()
boolean isEmpty()
java.util.Iterator<T> iterator(Partition split, TaskContext taskContext)
split
- (undocumented)taskContext
- (undocumented)<U> JavaPairRDD<U,T> keyBy(Function<T,U> f)
f
.f
- (undocumented)<R> JavaRDD<R> map(Function<T,R> f)
f
- (undocumented)<U> JavaRDD<U> mapPartitions(FlatMapFunction<java.util.Iterator<T>,U> f)
f
- (undocumented)<U> JavaRDD<U> mapPartitions(FlatMapFunction<java.util.Iterator<T>,U> f, boolean preservesPartitioning)
f
- (undocumented)preservesPartitioning
- (undocumented)JavaDoubleRDD mapPartitionsToDouble(DoubleFlatMapFunction<java.util.Iterator<T>> f)
f
- (undocumented)JavaDoubleRDD mapPartitionsToDouble(DoubleFlatMapFunction<java.util.Iterator<T>> f, boolean preservesPartitioning)
f
- (undocumented)preservesPartitioning
- (undocumented)<K2,V2> JavaPairRDD<K2,V2> mapPartitionsToPair(PairFlatMapFunction<java.util.Iterator<T>,K2,V2> f)
f
- (undocumented)<K2,V2> JavaPairRDD<K2,V2> mapPartitionsToPair(PairFlatMapFunction<java.util.Iterator<T>,K2,V2> f, boolean preservesPartitioning)
f
- (undocumented)preservesPartitioning
- (undocumented)<R> JavaRDD<R> mapPartitionsWithIndex(Function2<Integer,java.util.Iterator<T>,java.util.Iterator<R>> f, boolean preservesPartitioning)
f
- (undocumented)preservesPartitioning
- (undocumented)<R> JavaDoubleRDD mapToDouble(DoubleFunction<T> f)
f
- (undocumented)<K2,V2> JavaPairRDD<K2,V2> mapToPair(PairFunction<T,K2,V2> f)
f
- (undocumented)T max(java.util.Comparator<T> comp)
comp
- the comparator that defines orderingT min(java.util.Comparator<T> comp)
comp
- the comparator that defines orderingString name()
Optional<Partitioner> partitioner()
java.util.List<Partition> partitions()
JavaRDD<String> pipe(String command)
command
- (undocumented)JavaRDD<String> pipe(java.util.List<String> command)
command
- (undocumented)JavaRDD<String> pipe(java.util.List<String> command, java.util.Map<String,String> env)
command
- (undocumented)env
- (undocumented)JavaRDD<String> pipe(java.util.List<String> command, java.util.Map<String,String> env, boolean separateWorkingDir, int bufferSize)
command
- (undocumented)env
- (undocumented)separateWorkingDir
- (undocumented)bufferSize
- (undocumented)JavaRDD<String> pipe(java.util.List<String> command, java.util.Map<String,String> env, boolean separateWorkingDir, int bufferSize, String encoding)
command
- (undocumented)env
- (undocumented)separateWorkingDir
- (undocumented)bufferSize
- (undocumented)encoding
- (undocumented)T reduce(Function2<T,T,T> f)
f
- (undocumented)void saveAsObjectFile(String path)
path
- (undocumented)void saveAsTextFile(String path)
path
- (undocumented)void saveAsTextFile(String path, Class<? extends org.apache.hadoop.io.compress.CompressionCodec> codec)
path
- (undocumented)codec
- (undocumented)java.util.List<T> take(int num)
num
- (undocumented)JavaFutureAction<java.util.List<T>> takeAsync(int num)
take
action, which returns a
future for retrieving the first num
elements of this RDD.
num
- (undocumented)java.util.List<T> takeOrdered(int num, java.util.Comparator<T> comp)
num
- k, the number of elements to returncomp
- the comparator that defines the orderjava.util.List<T> takeOrdered(int num)
num
- k, the number of top elements to returnjava.util.List<T> takeSample(boolean withReplacement, int num)
java.util.List<T> takeSample(boolean withReplacement, int num, long seed)
String toDebugString()
java.util.Iterator<T> toLocalIterator()
The iterator will consume as much memory as the largest partition in this RDD.
java.util.List<T> top(int num, java.util.Comparator<T> comp)
num
- k, the number of top elements to returncomp
- the comparator that defines the orderjava.util.List<T> top(int num)
num
- k, the number of top elements to return<U> U treeAggregate(U zeroValue, Function2<U,T,U> seqOp, Function2<U,U,U> combOp, int depth)
depth
- suggested depth of the treezeroValue
- (undocumented)seqOp
- (undocumented)combOp
- (undocumented)aggregate(U, org.apache.spark.api.java.function.Function2<U, T, U>, org.apache.spark.api.java.function.Function2<U, U, U>)
<U> U treeAggregate(U zeroValue, Function2<U,T,U> seqOp, Function2<U,U,U> combOp)
org.apache.spark.api.java.JavaRDDLike.treeAggregate
with suggested depth 2.zeroValue
- (undocumented)seqOp
- (undocumented)combOp
- (undocumented)<U> U treeAggregate(U zeroValue, Function2<U,T,U> seqOp, Function2<U,U,U> combOp, int depth, boolean finalAggregateOnExecutor)
org.apache.spark.api.java.JavaRDDLike.treeAggregate
with a parameter to do the
final aggregation on the executor.zeroValue
- (undocumented)seqOp
- (undocumented)combOp
- (undocumented)depth
- (undocumented)finalAggregateOnExecutor
- (undocumented)T treeReduce(Function2<T,T,T> f, int depth)
depth
- suggested depth of the treef
- (undocumented)reduce(org.apache.spark.api.java.function.Function2<T, T, T>)
T treeReduce(Function2<T,T,T> f)
org.apache.spark.api.java.JavaRDDLike.treeReduce
with suggested depth 2.f
- (undocumented)<U> JavaPairRDD<T,U> zip(JavaRDDLike<U,?> other)
other
- (undocumented)<U,V> JavaRDD<V> zipPartitions(JavaRDDLike<U,?> other, FlatMapFunction2<java.util.Iterator<T>,java.util.Iterator<U>,V> f)
other
- (undocumented)f
- (undocumented)JavaPairRDD<T,Long> zipWithIndex()
JavaPairRDD<T,Long> zipWithUniqueId()
RDD.zipWithIndex()
.