Package org.apache.spark.streaming
Class StreamingContext
Object
org.apache.spark.streaming.StreamingContext
- All Implemented Interfaces:
- org.apache.spark.internal.Logging
Deprecated.
This is deprecated as of Spark 3.4.0.
             There are no longer updates to DStream and it's a legacy project.
             There is a newer and easier to use streaming engine
             in Spark called Structured Streaming.
             You should use Spark Structured Streaming for your streaming applications.
Main entry point for Spark Streaming functionality. It provides methods used to create
 
DStreams from various input sources. It can be either
 created by providing a Spark master URL and an appName, or from a org.apache.spark.SparkConf
 configuration (see core Spark documentation), or from an existing org.apache.spark.SparkContext.
 The associated SparkContext can be accessed using context.sparkContext. After
 creating and transforming DStreams, the streaming computation can be started and stopped
 using context.start() and context.stop(), respectively.
 context.awaitTermination() allows the current thread to wait for the termination
 of the context by stop() or by an exception.- 
Nested Class SummaryNested classes/interfaces inherited from interface org.apache.spark.internal.Loggingorg.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
- 
Constructor SummaryConstructorsConstructorDescriptionStreamingContext(String path) Deprecated.Recreate a StreamingContext from a checkpoint file.StreamingContext(String master, String appName, Duration batchDuration, String sparkHome, scala.collection.immutable.Seq<String> jars, scala.collection.Map<String, String> environment) Deprecated.Create a StreamingContext by providing the details necessary for creating a new SparkContext.StreamingContext(String path, org.apache.hadoop.conf.Configuration hadoopConf) Deprecated.Recreate a StreamingContext from a checkpoint file.StreamingContext(String path, SparkContext sparkContext) Deprecated.Recreate a StreamingContext from a checkpoint file using an existing SparkContext.StreamingContext(SparkConf conf, Duration batchDuration) Deprecated.Create a StreamingContext by providing the configuration necessary for a new SparkContext.StreamingContext(SparkContext sparkContext, Duration batchDuration) Deprecated.Create a StreamingContext using an existing SparkContext.
- 
Method SummaryModifier and TypeMethodDescriptionvoidaddStreamingListener(StreamingListener streamingListener) Deprecated.Add aStreamingListenerobject for receiving system events related to streaming.voidDeprecated.Wait for the execution to stop.booleanawaitTerminationOrTimeout(long timeout) Deprecated.Wait for the execution to stop.DStream<byte[]>binaryRecordsStream(String directory, int recordLength) Deprecated.Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them as flat binary files, assuming a fixed length per record, generating one byte array per record.voidcheckpoint(String directory) Deprecated.Set the context to periodically checkpoint the DStream operations for driver fault-tolerance.<K,V, F extends org.apache.hadoop.mapreduce.InputFormat<K, V>> 
 InputDStream<scala.Tuple2<K,V>> fileStream(String directory, scala.Function1<org.apache.hadoop.fs.Path, Object> filter, boolean newFilesOnly, org.apache.hadoop.conf.Configuration conf, scala.reflect.ClassTag<K> evidence$10, scala.reflect.ClassTag<V> evidence$11, scala.reflect.ClassTag<F> evidence$12) Deprecated.Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format.<K,V, F extends org.apache.hadoop.mapreduce.InputFormat<K, V>> 
 InputDStream<scala.Tuple2<K,V>> fileStream(String directory, scala.Function1<org.apache.hadoop.fs.Path, Object> filter, boolean newFilesOnly, scala.reflect.ClassTag<K> evidence$7, scala.reflect.ClassTag<V> evidence$8, scala.reflect.ClassTag<F> evidence$9) Deprecated.Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format.<K,V, F extends org.apache.hadoop.mapreduce.InputFormat<K, V>> 
 InputDStream<scala.Tuple2<K,V>> fileStream(String directory, scala.reflect.ClassTag<K> evidence$4, scala.reflect.ClassTag<V> evidence$5, scala.reflect.ClassTag<F> evidence$6) Deprecated.Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format.static scala.Option<StreamingContext>Deprecated.Get the currently active context, if there is one.static StreamingContextgetActiveOrCreate(String checkpointPath, scala.Function0<StreamingContext> creatingFunc, org.apache.hadoop.conf.Configuration hadoopConf, boolean createOnError) Deprecated.Either get the currently active StreamingContext (that is, started but not stopped), OR recreate a StreamingContext from checkpoint data in the given path.static StreamingContextgetActiveOrCreate(scala.Function0<StreamingContext> creatingFunc) Deprecated.Either return the "active" StreamingContext (that is, started but not stopped), or create a new StreamingContext that isstatic StreamingContextgetOrCreate(String checkpointPath, scala.Function0<StreamingContext> creatingFunc, org.apache.hadoop.conf.Configuration hadoopConf, boolean createOnError) Deprecated.Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.getState()Deprecated.:: DeveloperApi ::static scala.Option<String>jarOfClass(Class<?> cls) Deprecated.Find the JAR from which a given class was loaded, to make it easy for users to pass their JARs to StreamingContext.static org.apache.spark.internal.Logging.LogStringContextLogStringContext(scala.StringContext sc) Deprecated.static org.slf4j.LoggerDeprecated.static voidorg$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1) Deprecated.<T> InputDStream<T>queueStream(scala.collection.mutable.Queue<RDD<T>> queue, boolean oneAtATime, RDD<T> defaultRDD, scala.reflect.ClassTag<T> evidence$14) Deprecated.Create an input stream from a queue of RDDs.<T> InputDStream<T>queueStream(scala.collection.mutable.Queue<RDD<T>> queue, boolean oneAtATime, scala.reflect.ClassTag<T> evidence$13) Deprecated.Create an input stream from a queue of RDDs.<T> ReceiverInputDStream<T>rawSocketStream(String hostname, int port, StorageLevel storageLevel, scala.reflect.ClassTag<T> evidence$3) Deprecated.Create an input stream from network source hostname:port, where data is received as serialized blocks (serialized using the Spark's serializer) that can be directly pushed into the block manager without deserializing them.<T> ReceiverInputDStream<T>receiverStream(Receiver<T> receiver, scala.reflect.ClassTag<T> evidence$1) Deprecated.Create an input stream with any arbitrary user implemented receiver.voidDeprecated.Set each DStream in this context to remember RDDs it generated in the last given duration.voidremoveStreamingListener(StreamingListener streamingListener) Deprecated.<T> ReceiverInputDStream<T>socketStream(String hostname, int port, scala.Function1<InputStream, scala.collection.Iterator<T>> converter, StorageLevel storageLevel, scala.reflect.ClassTag<T> evidence$2) Deprecated.Creates an input stream from TCP source hostname:port.socketTextStream(String hostname, int port, StorageLevel storageLevel) Deprecated.Creates an input stream from TCP source hostname:port.Deprecated.Return the associated Spark contextvoidstart()Deprecated.Start the execution of the streams.voidstop(boolean stopSparkContext) Deprecated.Stop the execution of the streams immediately (does not wait for all received data to be processed).voidstop(boolean stopSparkContext, boolean stopGracefully) Deprecated.Stop the execution of the streams, with option of ensuring all received data has been processed.textFileStream(String directory) Deprecated.Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them as text files (using key as LongWritable, value as Text and input format as TextInputFormat).<T> DStream<T>transform(scala.collection.immutable.Seq<DStream<?>> dstreams, scala.Function2<scala.collection.immutable.Seq<RDD<?>>, Time, RDD<T>> transformFunc, scala.reflect.ClassTag<T> evidence$16) Deprecated.Create a new DStream in which each RDD is generated by applying a function on RDDs of the DStreams.<T> DStream<T>Deprecated.Create a unified DStream from multiple DStreams of the same type and same slide duration.Methods inherited from class java.lang.Objectequals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.spark.internal.LogginginitializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logBasedOnLevel, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
- 
Constructor Details- 
StreamingContextDeprecated.Create a StreamingContext using an existing SparkContext.- Parameters:
- sparkContext- existing SparkContext
- batchDuration- the time interval at which streaming data will be divided into batches
 
- 
StreamingContextDeprecated.Create a StreamingContext by providing the configuration necessary for a new SparkContext.- Parameters:
- conf- a org.apache.spark.SparkConf object specifying Spark parameters
- batchDuration- the time interval at which streaming data will be divided into batches
 
- 
StreamingContextpublic StreamingContext(String master, String appName, Duration batchDuration, String sparkHome, scala.collection.immutable.Seq<String> jars, scala.collection.Map<String, String> environment) Deprecated.Create a StreamingContext by providing the details necessary for creating a new SparkContext.- Parameters:
- master- cluster URL to connect to (e.g. spark://host:port, local[4]).
- appName- a name for your job, to display on the cluster web UI
- batchDuration- the time interval at which streaming data will be divided into batches
- sparkHome- (undocumented)
- jars- (undocumented)
- environment- (undocumented)
 
- 
StreamingContextDeprecated.Recreate a StreamingContext from a checkpoint file.- Parameters:
- path- Path to the directory that was specified as the checkpoint directory
- hadoopConf- Optional, configuration object if necessary for reading from HDFS compatible filesystems
 
- 
StreamingContextDeprecated.Recreate a StreamingContext from a checkpoint file.- Parameters:
- path- Path to the directory that was specified as the checkpoint directory
 
- 
StreamingContextDeprecated.Recreate a StreamingContext from a checkpoint file using an existing SparkContext.- Parameters:
- path- Path to the directory that was specified as the checkpoint directory
- sparkContext- Existing SparkContext
 
 
- 
- 
Method Details- 
getActiveDeprecated.Get the currently active context, if there is one. Active means started but not stopped.- Returns:
- (undocumented)
 
- 
getActiveOrCreateDeprecated.Either return the "active" StreamingContext (that is, started but not stopped), or create a new StreamingContext that is- Parameters:
- creatingFunc- Function to create a new StreamingContext
- Returns:
- (undocumented)
 
- 
getActiveOrCreatepublic static StreamingContext getActiveOrCreate(String checkpointPath, scala.Function0<StreamingContext> creatingFunc, org.apache.hadoop.conf.Configuration hadoopConf, boolean createOnError) Deprecated.Either get the currently active StreamingContext (that is, started but not stopped), OR recreate a StreamingContext from checkpoint data in the given path. If checkpoint data does not exist in the provided, then create a new StreamingContext by calling the providedcreatingFunc.- Parameters:
- checkpointPath- Checkpoint directory used in an earlier StreamingContext program
- creatingFunc- Function to create a new StreamingContext
- hadoopConf- Optional Hadoop configuration if necessary for reading from the file system
- createOnError- Optional, whether to create a new StreamingContext if there is an error in reading checkpoint data. By default, an exception will be thrown on error.
- Returns:
- (undocumented)
 
- 
getOrCreatepublic static StreamingContext getOrCreate(String checkpointPath, scala.Function0<StreamingContext> creatingFunc, org.apache.hadoop.conf.Configuration hadoopConf, boolean createOnError) Deprecated.Either recreate a StreamingContext from checkpoint data or create a new StreamingContext. If checkpoint data exists in the providedcheckpointPath, then StreamingContext will be recreated from the checkpoint data. If the data does not exist, then the StreamingContext will be created by called the providedcreatingFunc.- Parameters:
- checkpointPath- Checkpoint directory used in an earlier StreamingContext program
- creatingFunc- Function to create a new StreamingContext
- hadoopConf- Optional Hadoop configuration if necessary for reading from the file system
- createOnError- Optional, whether to create a new StreamingContext if there is an error in reading checkpoint data. By default, an exception will be thrown on error.
- Returns:
- (undocumented)
 
- 
jarOfClassDeprecated.Find the JAR from which a given class was loaded, to make it easy for users to pass their JARs to StreamingContext.- Parameters:
- cls- (undocumented)
- Returns:
- (undocumented)
 
- 
org$apache$spark$internal$Logging$$log_public static org.slf4j.Logger org$apache$spark$internal$Logging$$log_()Deprecated.
- 
org$apache$spark$internal$Logging$$log__$eqpublic static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1) Deprecated.
- 
LogStringContextpublic static org.apache.spark.internal.Logging.LogStringContext LogStringContext(scala.StringContext sc) Deprecated.
- 
sparkContextDeprecated.Return the associated Spark context- Returns:
- (undocumented)
 
- 
rememberDeprecated.Set each DStream in this context to remember RDDs it generated in the last given duration. DStreams remember RDDs only for a limited duration of time and release them for garbage collection. This method allows the developer to specify how long to remember the RDDs ( if the developer wishes to query old data outside the DStream computation).- Parameters:
- duration- Minimum duration that each DStream should remember its RDDs
 
- 
checkpointDeprecated.Set the context to periodically checkpoint the DStream operations for driver fault-tolerance.- Parameters:
- directory- HDFS-compatible directory where the checkpoint data will be reliably stored. Note that this must be a fault-tolerant file system like HDFS.
 
- 
receiverStreampublic <T> ReceiverInputDStream<T> receiverStream(Receiver<T> receiver, scala.reflect.ClassTag<T> evidence$1) Deprecated.Create an input stream with any arbitrary user implemented receiver. Find more details at https://spark.apache.org/docs/latest/streaming-custom-receivers.html- Parameters:
- receiver- Custom implementation of Receiver
- evidence$1- (undocumented)
- Returns:
- (undocumented)
 
- 
socketTextStreampublic ReceiverInputDStream<String> socketTextStream(String hostname, int port, StorageLevel storageLevel) Deprecated.Creates an input stream from TCP source hostname:port. Data is received using a TCP socket and the receive bytes is interpreted as UTF8 encoded\ndelimited lines.- Parameters:
- hostname- Hostname to connect to for receiving data
- port- Port to connect to for receiving data
- storageLevel- Storage level to use for storing the received objects (default: StorageLevel.MEMORY_AND_DISK_SER_2)
- Returns:
- (undocumented)
- See Also:
 
- 
socketStreampublic <T> ReceiverInputDStream<T> socketStream(String hostname, int port, scala.Function1<InputStream, scala.collection.Iterator<T>> converter, StorageLevel storageLevel, scala.reflect.ClassTag<T> evidence$2) Deprecated.Creates an input stream from TCP source hostname:port. Data is received using a TCP socket and the receive bytes it interpreted as object using the given converter.- Parameters:
- hostname- Hostname to connect to for receiving data
- port- Port to connect to for receiving data
- converter- Function to convert the byte stream to objects
- storageLevel- Storage level to use for storing the received objects
- evidence$2- (undocumented)
- Returns:
- (undocumented)
 
- 
rawSocketStreampublic <T> ReceiverInputDStream<T> rawSocketStream(String hostname, int port, StorageLevel storageLevel, scala.reflect.ClassTag<T> evidence$3) Deprecated.Create an input stream from network source hostname:port, where data is received as serialized blocks (serialized using the Spark's serializer) that can be directly pushed into the block manager without deserializing them. This is the most efficient way to receive data.- Parameters:
- hostname- Hostname to connect to for receiving data
- port- Port to connect to for receiving data
- storageLevel- Storage level to use for storing the received objects (default: StorageLevel.MEMORY_AND_DISK_SER_2)
- evidence$3- (undocumented)
- Returns:
- (undocumented)
 
- 
fileStreampublic <K,V, InputDStream<scala.Tuple2<K,F extends org.apache.hadoop.mapreduce.InputFormat<K, V>> V>> fileStream(String directory, scala.reflect.ClassTag<K> evidence$4, scala.reflect.ClassTag<V> evidence$5, scala.reflect.ClassTag<F> evidence$6) Deprecated.Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format. Files must be written to the monitored directory by "moving" them from another location within the same file system. File names starting with . are ignored.- Parameters:
- directory- HDFS directory to monitor for new file
- evidence$4- (undocumented)
- evidence$5- (undocumented)
- evidence$6- (undocumented)
- Returns:
- (undocumented)
 
- 
fileStreampublic <K,V, InputDStream<scala.Tuple2<K,F extends org.apache.hadoop.mapreduce.InputFormat<K, V>> V>> fileStream(String directory, scala.Function1<org.apache.hadoop.fs.Path, Object> filter, boolean newFilesOnly, scala.reflect.ClassTag<K> evidence$7, scala.reflect.ClassTag<V> evidence$8, scala.reflect.ClassTag<F> evidence$9) Deprecated.Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format. Files must be written to the monitored directory by "moving" them from another location within the same file system.- Parameters:
- directory- HDFS directory to monitor for new file
- filter- Function to filter paths to process
- newFilesOnly- Should process only new files and ignore existing files in the directory
- evidence$7- (undocumented)
- evidence$8- (undocumented)
- evidence$9- (undocumented)
- Returns:
- (undocumented)
 
- 
fileStreampublic <K,V, InputDStream<scala.Tuple2<K,F extends org.apache.hadoop.mapreduce.InputFormat<K, V>> V>> fileStream(String directory, scala.Function1<org.apache.hadoop.fs.Path, Object> filter, boolean newFilesOnly, org.apache.hadoop.conf.Configuration conf, scala.reflect.ClassTag<K> evidence$10, scala.reflect.ClassTag<V> evidence$11, scala.reflect.ClassTag<F> evidence$12) Deprecated.Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format. Files must be written to the monitored directory by "moving" them from another location within the same file system. File names starting with . are ignored.- Parameters:
- directory- HDFS directory to monitor for new file
- filter- Function to filter paths to process
- newFilesOnly- Should process only new files and ignore existing files in the directory
- conf- Hadoop configuration
- evidence$10- (undocumented)
- evidence$11- (undocumented)
- evidence$12- (undocumented)
- Returns:
- (undocumented)
 
- 
textFileStreamDeprecated.Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them as text files (using key as LongWritable, value as Text and input format as TextInputFormat). Files must be written to the monitored directory by "moving" them from another location within the same file system. File names starting with . are ignored. The text files must be encoded as UTF-8.- Parameters:
- directory- HDFS directory to monitor for new file
- Returns:
- (undocumented)
 
- 
binaryRecordsStreamDeprecated.Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them as flat binary files, assuming a fixed length per record, generating one byte array per record. Files must be written to the monitored directory by "moving" them from another location within the same file system. File names starting with . are ignored.- Parameters:
- directory- HDFS directory to monitor for new file
- recordLength- length of each record in bytes
- Returns:
- (undocumented)
- Note:
- We ensure that the byte array for each record in the resulting RDDs of the DStream has the provided record length.
 
- 
queueStreampublic <T> InputDStream<T> queueStream(scala.collection.mutable.Queue<RDD<T>> queue, boolean oneAtATime, scala.reflect.ClassTag<T> evidence$13) Deprecated.Create an input stream from a queue of RDDs. In each batch, it will process either one or all of the RDDs returned by the queue.- Parameters:
- queue- Queue of RDDs. Modifications to this data structure must be synchronized.
- oneAtATime- Whether only one RDD should be consumed from the queue in every interval
- evidence$13- (undocumented)
- Returns:
- (undocumented)
- Note:
- Arbitrary RDDs can be added to queueStream, there is no way to recover data of those RDDs, soqueueStreamdoesn't support checkpointing.
 
- 
queueStreampublic <T> InputDStream<T> queueStream(scala.collection.mutable.Queue<RDD<T>> queue, boolean oneAtATime, RDD<T> defaultRDD, scala.reflect.ClassTag<T> evidence$14) Deprecated.Create an input stream from a queue of RDDs. In each batch, it will process either one or all of the RDDs returned by the queue.- Parameters:
- queue- Queue of RDDs. Modifications to this data structure must be synchronized.
- oneAtATime- Whether only one RDD should be consumed from the queue in every interval
- defaultRDD- Default RDD is returned by the DStream when the queue is empty. Set as null if no RDD should be returned when empty
- evidence$14- (undocumented)
- Returns:
- (undocumented)
- Note:
- Arbitrary RDDs can be added to queueStream, there is no way to recover data of those RDDs, soqueueStreamdoesn't support checkpointing.
 
- 
unionpublic <T> DStream<T> union(scala.collection.immutable.Seq<DStream<T>> streams, scala.reflect.ClassTag<T> evidence$15) Deprecated.Create a unified DStream from multiple DStreams of the same type and same slide duration.- Parameters:
- streams- (undocumented)
- evidence$15- (undocumented)
- Returns:
- (undocumented)
 
- 
transformpublic <T> DStream<T> transform(scala.collection.immutable.Seq<DStream<?>> dstreams, scala.Function2<scala.collection.immutable.Seq<RDD<?>>, Time, RDD<T>> transformFunc, scala.reflect.ClassTag<T> evidence$16) Deprecated.Create a new DStream in which each RDD is generated by applying a function on RDDs of the DStreams.- Parameters:
- dstreams- (undocumented)
- transformFunc- (undocumented)
- evidence$16- (undocumented)
- Returns:
- (undocumented)
 
- 
addStreamingListenerDeprecated.Add aStreamingListenerobject for receiving system events related to streaming.- Parameters:
- streamingListener- (undocumented)
 
- 
removeStreamingListenerDeprecated.
- 
getStateDeprecated.:: DeveloperApi ::Return the current state of the context. The context can be in three possible states - - StreamingContextState.INITIALIZED - The context has been created, but not started yet. Input DStreams, transformations and output operations can be created on the context. - StreamingContextState.ACTIVE - The context has been started, and not stopped. Input DStreams, transformations and output operations cannot be created on the context. - StreamingContextState.STOPPED - The context has been stopped and cannot be used any more. - Returns:
- (undocumented)
 
- 
startpublic void start()Deprecated.Start the execution of the streams.- Throws:
- IllegalStateException- if the StreamingContext is already stopped.
 
- 
awaitTerminationpublic void awaitTermination()Deprecated.Wait for the execution to stop. Any exceptions that occurs during the execution will be thrown in this thread.
- 
awaitTerminationOrTimeoutpublic boolean awaitTerminationOrTimeout(long timeout) Deprecated.Wait for the execution to stop. Any exceptions that occurs during the execution will be thrown in this thread.- Parameters:
- timeout- time to wait in milliseconds
- Returns:
- trueif it's stopped; or throw the reported error during the execution; or- falseif the waiting time elapsed before returning from the method.
 
- 
stoppublic void stop(boolean stopSparkContext) Deprecated.Stop the execution of the streams immediately (does not wait for all received data to be processed). By default, ifstopSparkContextis not specified, the underlying SparkContext will also be stopped. This implicit behavior can be configured using the SparkConf configuration spark.streaming.stopSparkContextByDefault.- Parameters:
- stopSparkContext- If true, stops the associated SparkContext. The underlying SparkContext will be stopped regardless of whether this StreamingContext has been started.
 
- 
stoppublic void stop(boolean stopSparkContext, boolean stopGracefully) Deprecated.Stop the execution of the streams, with option of ensuring all received data has been processed.- Parameters:
- stopSparkContext- if true, stops the associated SparkContext. The underlying SparkContext will be stopped regardless of whether this StreamingContext has been started.
- stopGracefully- if true, stops gracefully by waiting for the processing of all received data to be completed
 
 
-